Opsview Knowledge Center

SNMP Polling

Overview of SNMP Polling in Opsview Monitor

What is SNMP Polling?

SNMP Polling is a well-defined and well-understood method of monitoring within the IT monitoring industry. SNMP stands for Simple Network Management Protocol, and is a standard way of monitoring hardware and software from nearly every vendor on the planet; such as Cisco, VMware, Juniper, Microsoft, Linux operating systems and more.

There are two parts to SNMP ' a Network Management Station (NMS) and a Management Agent
(MA).The NMS, Opsview Monitor in this case, communicates with the management agent running
on the hardware/software in question, using SNMP.

The management agent that runs on the hardware / software collects information about the aforementioned hardware/software and presents it in a logical fashion, allowing for it to be polled by the management station (Opsview Monitor).

This 'logical fashion' uses two key concepts; OIDs (Object Identifier) and MIBs (Management Information Base). SNMP works by querying Objects, where an object is something containing data about a specific item within the hardware/software in question, i.e. temperature of a chip, etc.
SNMP identifies Objects like this with an Object Identifier (OID).

OID's are very structured and take a numbered, hierarchical tree structure. Most of the time, OID's are translated into a more readable format, but you still might encounter situations where you will need to use the raw numbers - to find out more see the Guide here: link.

Tied closely to OID's are the Management Information Bases or MIBs. A MIB is like a translator that helps your network management station (NMS) to understand the 'numbers' within the OID. This means that instead of seeing ' 44.03' the MIB will translate and allows Opsview Monitor to display 'CPU0 Temperature: 44.03'. In essence, the MIB makes SNMP objects usable.

MIB's can be downloaded from the hardware/software vendor and loaded into Opsview Monitor by installing them into your distribution's designated MIB directories.

Note: If you are using SNMP Traps, ensure that you copy your custom MIBs to '/usr/local/nagios/snmp/load'.

In the example below, the first group of text is missing a MIB file and is not able to fully translate the OID's into a human-readable format (you can see the ., for example.) The second group of text is able to fully translate the OID's using a MIB.

SNMPv2-SMI::transmission. = Gauge32: 0
SNMPv2-SMI::transmission. = Gauge32: 0
SNMPv2-SMI::transmission. = Gauge32: 1151944
SNMPv2-SMI::transmission. = Gauge32: 0
SNMPv2-SMI::transmission. = Gauge32: 0
SNMP output without a MIB

MIB::adslAtucChanCrcBlockLength.4 = Gauge32: 0 byte
ADSL-LINE-MIB::adslAturChanInterleaveDelay.4 = Gauge32: 0 milli-seconds
ADSL-LINE-MIB::adslAturChanCurrTxRate.4 = Gauge32: 1151944 bps
ADSL-LINE-MIB::adslAturChanPrevTxRate.4 = Gauge32: 0 bps
ADSL-LINE-MIB::adslAturChanCrcBlockLength.4 = Gauge32: 0
SNMP output with a MIB

SNMP Security

There are three versions of the SNMP protocol supported in Opsview Monitor:

  • SNMP v1
  • SNMP v2c
  • SNMP v3

SNMP v1 and SNMP v2c are very similar in their configuration; an administrator configures a field known as the community string which is the authentication string (password, essentially) that the NMS needs to get the data from the MA (i.e. how Opsview Monitor can log in to the router to get the information about it).

SNMP v3 is more secure in that it allows an administrator to set a username, an authentication algorithm, an authentication password, a privacy algorithm AND a privacy password ' all of which must be entered correctly within Opsview Monitor in order to allow access to the router/devices information.

In the screen below, both SNMP v1/v2c and v3 are configured:

Example SNMP configuration on a Draytek router

Example SNMP configuration on a Draytek router

These credentials must be entered into Opsview Monitor in order to allow the monitoring of the Host in question.

Configuring a Host for SNMP Polling

To configure a Host so that Opsview Monitor is able to poll it for information, you should configure the 'SNMP' tab within the Host edit modal window. This is covered within detail within Section Configuring a host: 'SNMP' tab

Configuring a New SNMP Polling Check

To configure a new SNMP polling check, navigate to Service Checks; this is located within the 'Settings' tab in the overlay menu, as below:

Menu with 'Service Checks' highlighted

Menu with 'Service Checks' highlighted

Once within the Service Checks window, click on the 'Add New' button in the top left and then click on SNMP Polling.

'Add New > SNMP Polling' within Service Checks window

'Add New > SNMP Polling' within Service Checks window

Once 'SNMP Polling' has been clicked, a window similar to the one below will load:

New SNMP Polling service check

New SNMP Polling service check

The window is split into two tabs:

  • Details: This is where you can configure various Service Check related fields, such as the name, description, its Service Group, its Host templates and more
  • SNMP Polling: The SNMP polling specific tab ' this is where an 'SNMP Walk' can be ran against a specified host, retrieving the OID's which can be monitored.

Details Tab: Basic

The Details tab is split into two drawers, 'Basic' and 'Advanced'.

The items within 'Basic' are the most commonly used fields for Service Check configuration:

  • Name: The name of the service check, i.e. 'Cisco 3750 Stack configuration status'.
  • Description: A friendly description of the Service Check, i.e. 'A custom SNMP check that returns the status of the switch in the context of its stack configuration. Apply this to all stacked Cisco 3750's.'
  • Service group: Covered in Section Service Group, a Service Group is a container for one or more Service Checks and are used for alerting and access control, amongst others.
  • Host templates: Covered in Section, a Host template can contain one or more Service Checks from any Service Group. While a Service Check can only ever belong to one Service Group, it can belong to as many Host templates as you desire.
  • Check period: Covered in Section 'Overview ' an introduction to Time Periods', the check period defines when the Service Check runs. Generally, this is set to 'inherit from host', meaning if the Host is set to be monitored between 9:00 am and 5:00 pm, then the Service Check will also only run between 9:00 am and 5:00 pm.
  • Check interval: The interval between Service Check execution, i.e. if set to 5m the Service Check will run every 5m; if set to 30s the Service Check will run every 30 seconds.

Details Tab: Advanced

The items within 'Advanced' are the less used, more 'advanced' Service Check options:

  • Hashtags: The hashtags which this Service Check will belong to, when applied to one or more hosts.
  • Globally applied hashtags: If the Service Check has been added to a Hashtag via the 'Settings > Hashtags' section instead of the selection box above, then the Hashtags will be listed here. To remove the Service Check from the Hashtag listed here, you should edit the Hashtag within 'Settings > Hashtags'.
  • Dependencies: Dependencies allow you to set a parent/child relationship for the Service Check, i.e. for this SNMP polling check, we may choose to have a parent Service Check of 'TCP Port 161'. This means that if the Service Check 'TCP Port 161' changes to a critical state (i.e. SNMP is DOWN), then this Service Check and all other Service Checks that are a child of the aforementioned service check will change to an UNREACHABLE state and will not re-check until the parent Service Check returns to an 'OK' state. This not only reduces the work load of the Opsview Monitor server but also reduces alerts; Opsview Monitor will only alert for the 'TCP Port 161' failure and not for all of its dependent children.
  • Maximum check attempts: This field determines the number of times a Service Check has to fail for the Service Check to change into a 'hard state'. In Opsview Monitor 5.0 there is the concept of 'soft' and 'hard' states. When a Service Check fails and the Service Check changes into the 'CRITICAL' state it is considered a 'soft' state. After the Service Check has failed for the number of times specified in this field is considered a 'hard' state, i.e. not a temporary blip, etc. You can use hard states so that you are only notified when a Service Check is truly CRITICAL. The interval used here is not the 'check interval' but the 'Retry interval'.
  • Retry interval: A separate field to the 'Check interval', the 'Retry interval' is only used when a Service Check goes into a 'CRITICAL' / 'WARNING' / 'UNKNOWN' state. For a Service Check to go from a 'soft' state to a 'hard' state, the Service Check must fail $X number of times, where $X is the value set in this field. For example, if the Retry Interval is 1m and the Max Check Attempts is set to three, the Service Check will run once a minute for three minutes ' after which if the Service Check is still 'CRITICAL' it will change from a 'soft DOWN' to a 'hard DOWN'.
  • Notify for service on This section determines which states the Service Check should notify on, i.e. only on 'CRITICAL' or 'UNKNOWN', for example. Note: If a Host does not notify on any states, then the Service Checks on that Host will also not send any Notifications.
  • Notification period: This field uses the 'Time Periods' already defined within the Opsview Monitor system, and determines when Notifications are allowed to be sent to Users.
  • Re-notification interval: This field determines the period of time (in hours, minutes or seconds) after which a Notification is re-sent if the Host is still unhandled (i.e. the problem has not been ACKNOWLEDGED). If this is set to '0', only the first Notification is sent (when the Host changes to the 'HARD' state).
  • Create Multiple Services: If a Variable is selected within this drop-down, for each Variable of the selected type added a new Service Check will be added with the value in the Variable added to the Service Check name. I.e. if we have 'Disk Capacity' as a Service Check with '%DISK%' selected in the 'Create Multiple Services: drop down', then if four Variables are added via the 'Variables' tab ' 4 Service Checks will be added 'Disk Capacity: Value1, Disk Capacity: Value2', and so forth.
  • Flap Detection: A service is considered flapping if its state changes too much. If this option is set, any services will be checked for this flapping condition and an icon will appear for the service and Notifications will be temporarily disabled until the service comes out of a flapping state.
  • We recommend that flap detection is enabled for active checks. However if you find a service is flapping frequently, there is probably another issue that needs investigating.
  • We recommend that flap detection is disabled for passive checks.
  • Sensitive arguments: If the Service Check is a plugin-based one, then the Sensitive Arguments checkbox allow you to determine if the arguments for the Service Check are displayed within the 'Test Service Check' tab within the investigate mode. If the flag is checked, the arguments will be hidden ' if unchecked the arguments will be shown. If a User has TESTCHANGE set within their Role, you will be able to modify the arguments before testing the Service Check.
  • Record Output Changes: Normally, the output of a Service Check is only recorded when the state of that service changes. For example, assuming a new check has been set up:
State Output Output Recorded
OK Service OK: 10% Yes
OK Service OK: 15% No
OK Service OK: 15% No
OK Service OK: 20% No
CRITICAL Service warning: 80% Yes
CRITICAL Service warning: 75% No
WARNING Service warning: 70% Yes
WARNING Service warning: 40% No
WARNING Service warning: 40% No
OK Service OK: 20% Yes
OK Service OK: 18% No
  • This option instead causes every change of output to be logged regardless of change of state (for the selected state changes). For example, for the same sequence above with OK and WARNING selected:
State Output Output Recorded
OK Service OK: 10% Yes
OK Service OK: 15% Yes
OK Service OK: 15% No
OK Service OK: 20% Yes
CRITICAL Service warning: 80% Yes
CRITICAL Service warning: 75% NO - CRITICAL option was not selected
WARNING Service warning: 70% Yes
WARNING Service warning: 40% Yes
WARNING Service warning: 40% No
OK Service OK: 20% Yes
OK Service OK: 18% Yes
  • Alert every failure: This option forces a Notification to be sent on every check in a non-OK state. This is useful if you have a passive Service Check which receives results.
  • There are three states for this option:
    • Disabled: only get alerts on state changes
    • Enabled: get alerts for every failed state. This overrides the re-notification interval option
    • Enabled with re-notification interval: get alerts for every failed state as long as the re
  • Notification interval has passed. This is useful if you get a lot of results in quick succession

Note: The Notification number will increase for every non-OK result and only gets reset to zero when an OK state is received.

  • Event handler: Covered in greater detail in the 'Event handler' section of the User Guide, Event Handlers are scripts that can be triggered when a Service Check goes into a 'CRITICAL', 'WARNING' or 'UNKNOWN' state (soft/hard, depending on the event handler script). The script can do anything you like, but a common usage includes restarting a service or server (virtual machine, for example) via an API.
  • Markdown filter: If this option is chosen, then the service output will be filtered through the Markdown plugin. This allows you to mark up the output with bold, italics and URL links. For instance, if the output is:

Disk failure on sd1 - see internal wiki

This will be displayed as:

Disk failure on sd1 - see internal wiki

Use http://daringfireball.net/projects/markdown/dingus to test your plugin output. Bear in mind that you cannot use the pipe symbol as Nagios Core interprets this as the start of performance data.
Also, < and > characters are converted to the HTML entities so you cannot embed other HTML tags, and finally you should keep to only one line due to NSCA limitations in a distributed environment. Therefore, you should stick to using just bold, italics and links in your output.
Note: If your plugin returns HTML output, this will be displayed as the text. You must use markdown format if you want to use links.

SNMP Polling Tab

Once you have configured the relevant options within the 'Details' tab, you can click on the SNMP polling tab:

The main user journey for the SNMP Polling tab is:

  • Choose a Host (the Host must be added to Opsview Monitor and have the 'SNMP:' tab configured correctly.
  • Run 'SNMP Walk' against the Host. This will list the OID's of the Host, either unformatted (…) or formatted (using MIB’s).
  • Select an OID and it will be populated into the OID field.
  • Add the calculations using the Warning/Critical sections, i.e. if we are monitoring a temperature, set the warning field to '>' for the numeric comparison and the value to '40', and the same for critical but with a higher value. This means, if the temperature goes above 40, set the Service Check to warning, if it goes above X set it to critical.
  • Click submit changes and apply to one or more Host(s).

The first step is to run the SNMP walk. You must select a host from the drop-down box:

Once selected, the 'SNMP Walk' button will be available to select. Clicking the 'SNMP Walk' button will retrieve all of the OID's from the Host in question, as below:

Once the OID's are listed, you can choose to 'Rescan', which will re-discover all of the OID's on the Host. This is great in the event that you haven't loaded the MIB and have a lot of OID's, as above. In this case, you can add the MIB's and rescan which should convert the numerical OID's into a formatted, human- understandable version,

Next, select an OID which requires monitoring ' in the example below we are going to monitor uptime, therefore we have clicked on the 'Timeticks: (….’ Line:

Next, modify the label, calculation rate and the warning/critical values as required:

Click 'Submit Changes' and the new Service Check is created! See Section Service Checks Tab for guides on how to add the newly-created Service Check to a Host.