Now that we are
monitoring our Hosts and also the services running on them, we can start to be
intelligent with the interpretation of the data; i.e. “how do we present this
so that we can clearly see that a problem is occurring?”.
Commonly in IT,
servers and network devices are part of larger systems such as business applications,
websites, services, etc and these are the items we are really concerned about,
rather than the actual items which they consist of.
For example if I run a website
selling product XYZ, I want to know when this website and what in
this website, is impacted by an issue in any of the IT components that are used in the delivery of the website. We know that
this website is made up of a few Apache servers, running on two Linux servers,
connected to the internet via a router and a switch, and also relying on a DNS
“monitoring layer” outlined in the previous section, we are using service
checks to monitor performance metrics for each of these areas individually:
Apache: number of requests per second, number
of apache processes, etc.
Linux servers: CPU usage, memory usage, hard
disk drive space, temperature, etc.
Switch/Router: CPU load, RAM usage, interface
throughput, packets per second, etc.
DNS Server: DNS service running, performance
counters, queues, etc.
This gives us a
great view into how each of these Hosts are doing. What we want now is to look
at it holistically; as a website
rather than a series of objects. To do this, monitoring software vendors and application performance monitoring (APM) vendors have created “business service monitoring” (BSM).
The purpose of
these business services is to allow you a view into the performance of your
business applications ‘as a whole’ rather than having to look at
all the individual Hosts and work it out for yourself. This capability is what differentiates a basic monitoring tool from an enterprise class one, such as Opsview Monitor. This ability to measure, interpret and manage services and
“top down views” delivers much greater value to the sysadmin. An example of a BSM view is shown below.
Monitor, we have two features which help you with this; Business Service Monitoring (BSM) and Hashtags.
BSM allows you to monitor your true business service, i.e. the website,
application, etc instead of looking at all the individual Hosts and Service
Checks underneath. This allows you to set up notifications only on Business
Services, so instead of getting emails every five minutes about ‘This
individual server is using too much memory’, you will only get a notification
when your Business Service is impacted (i.e. something critical within this
Business Service has failed that may impact its performance), or if the Business
Service is DOWN. This significantly
reduces the ‘alert fatigue’ and ensures you only get notifications when you
really need them.
service is an important part of your business, such as your public website.
This will consist of multiple components, which are groups of Hosts with
similar Service Checks. Opsview Monitor will calculate the state of components, taking
into account redundancy and failover, so that you can get up to date
information of each layer. This is then aggregated to the business service
level to give an overall health for business and technical owners, as shown in the
For example, if you
have a data center called ‘Texas01’, you could tag every Host and Service Check
within that data center with the Hashtag ‘#texas01’. You can then view via the
‘Hashtags’ section within Opsview Monitor, the health of the Texas data center
the above example, we can see that there is a Service Check or Host within our Texas
data center that is in a problem state (i.e. CRITICAL, DOWN, etc).
Once you have
configured your Hashtags and Business Services, not only can you monitor based
upon them – i.e. using the above views to see the actual “health” of your Website
/ Data Center, but you can use these Hashtags/Business Services within other
sections of Opsview Monitor. One example is filtering the ‘Events Viewer’ to
show events that have happened only on objects tagged with a given Hashtag, as shown below:
This allows you
to see all the events that have happened on any piece of hardware/software
(applications through to the switches), that are related to the performance of
your data center, website, etc.
Hashtags/Business Services can also be used within ‘Graph Center’, to display
graphically all the Service Checks tagged with a given Hashtag or all the Service
Checks that are within a given component:
/ Business Services can also be used in the Reports section, to show the historical health of a Hashtag/Business Service in a number of standard business reports, such as SLA
reports or cost of downtime reports, through to technical reports such as
performance reports, availability reports, etc.
also be automated, so that you can see at configurable time intervals, such as the start of each month / every day,
the performance/availability/cost of downtime for any pre-defined Hashtag,
delivered in a .pdf format and carrying your barnd – directly into your inbox or a manager’s inbox each morning!
This section has demonstrated how monitoring functions within a tiered approach,
from simple monitoring of whether a device is responsive, through using
business service monitoring (BSM) to view the health of services, to reporting the cost and availability impact of incidents.
There are a
huge range of other advanced capabilities within Opsview Monitor that build on these
problem resolution using Event Handlers
periods using Scheduled Downtime
analysis using Network Analyzer
control using Hashtags / Business Services / more
using one of 13 different notification methods, such as Slack or Email.
are all explored in detail in Section 4, ‘User Guide’.