In this section, we discuss several design considerations to aid in planning your Opsview Monitor system, such as scalability considerations, resilience, disk partitioning and security.
Later in this section, we discuss how Opsview Monitor uses databases.
When deploying your Opsview Monitor server, you should bear in mind the variables that may affect your system and how many Hosts can actually be monitored, As such, you need to be mindful of the following factors:
- The number of Service Checks per Host.
- The median interval for Service Checks.
- The type of checks being executed, that is, quality of plugin code, local execution vs. agent queries and so on.
- Network latency.
Typically we recommend 300 Hosts as a comfortably manageable limit for a single Opsview Monitor server, however, there are a number of assumptions made in making this recommendation. These assumptions result in approximately ten Service Checks per second being executed by the monitoring server.
- 10 Service Checks per Host (average).
- Five minute interval per Service Check (average).
- The majority of Service Checks are made against a remote agent; for example, Nagios® Remote Plugin Executor (NRPE) or Simple Network Management Protocol (SNMP).
- The majority of monitored Hosts are on the same Local Area Network (LAN).
- That your system specification is a modest physical or virtual server with 4-8 CPU cores and 16GB RAM
With appropriate tuning and use of better hardware, however, a single server can typically be made to scale well beyond 300 hosts.
Opsview Monitor's Distributed Monitoring architecture can also be used to monitor a much larger number of Hosts. Please refer to that section for more details.
When designing a system, one of the most important metrics for consideration is 'service checks per second', which is a factor of both the total number of checks configured as well as the interval between those checks. Generally, we recommend no more than around 20 service checks per second on a single machine. For example, if we have around 2000 hosts with ten checks per host using a five minute interval, this will clearly exceed our recommended checks per second, as shown in the example below:
An example configuration, which exceeds our recommended checks per second:
2000 (hosts) * 10 (service checks) / 300 (seconds) = 66 (service checks per second)
To achieve a comfortable rate and to bring down the checks per second within our recommended guidelines, we would need to attach three collectors to the master server, which will achieve a rate of 22 checks per second per collector. Moreover, if we utilize each CPU core of our collector systems to handle a separate worker thread, we can further divide our checks per second (66) by the number of cores our collector servers possess. For example, if we have 2 x dual core CPUs in our collector servers, this further reduces the number of checks per second for each core to 11, as we show below:
Utilizing CPU cores can further reduce checks per second:
66 (service checks per second) / 4 (number of cores) = 16.5
The Opsview Monitor distributed architecture combines both scalability and resilience; however, resilience can, in fact, be effectively enhanced by 'doubling' the components that comprise your system, as we demonstrate in the following list.
- Master server (active)
- Master server (standby)
- Database server #1: 'opsview' and 'runtime', replica of 'odw'
- Database server #2: 'odw', replica of 'opsview' and 'runtime'
- Collector clusters:
- Collector cluster #1
- Collector cluster #2
- Collector cluster #3
- Collector cluster #4
- Collector cluster #5
Note: When assessing a collector cluster, you should allow the possibility of at least one node failure. If a collector cluster is nearing capacity, then the failure of one node may cause other nodes to exceed capacity. Also, there should always be an odd number of nodes within a collector cluster; 1, 3, 5, etc. This is to help with resiliency and to avoid split-brain issues when clustering the components on the servers.
Note: It is not possible to run two Opsview Monitor master servers in active/active configuration, only active/passive. Running a second master node in either High Availability or Disaster Recovery configuration requires an HA or DR subscription from Opsview.
In this section, we detail the sizes of the disk partitions that are needed to operate the Opsview Monitor software and its software dependencies.
In general, one large root partition is sufficient, although we recommend the root and /var directories have at least 1GB of disk space available. In the following list, we provide further information about other areas and their recommended sizes.
- root: We recommend at least 2GB is set aside for the operating system, allowing for any upgrades and so on.
- /boot: We recommend a separate boot partition of at least 256MB.
- /opt/opsview: All Opsview Monitor software and runtime data are stored in this location. We recommend a minimum of 10GB.
Opsview Monitor uses a temporary directory (/tmp by default) when running opsview-web and other related applications. You can set a system level environment variable, namely TMPDIR=/, if you wish to use an alternate area.
The database can either be located on the master or on a separate server. Nonetheless, in both instances, the Opsview Monitor database and backups are located in the /var directory and we provide our recommended size below.
- /var: We recommend that this directory has more than 100GB available when used in conjunction with the ODW; however, if ODW is not used, then 50GB is sufficient for small and medium sized systems.
Opsview Monitor uses cron to run a configuration backup job at 23:00 every night.
The scope of the backups is to be able to aid restoring all configuration on a system after the required packages have been installed.
You will need to design your own backup strategies for long term archival of data.
You can invoke an Adhoc backup by running as the opsview user:
This will backup all files within the directories:
You can do a full offline backup by taking the following steps:
- Shutdown Opsview Monitor on Orchestrator
- Shutdown Opsview Monitor on all Collectors
- Backup filesystem /opt/opsview on all Opsview Monitor servers
- Stop mysql on database server
- Backup mysql data files
This will back up all data regarding Opsview Monitor so you can restore to this point in time.
In this section, we highlight several security aspects of the Opsview Monitor system.
Opsview Monitor's web authentication uses an authentication ticket with a shared secret and must be set to a unique value for your system.
You can also configure the amount of time that a web session can run before it expires. This is controlled by these two variables in user_vars.yml:
opsview_web_session_timeout_secs: 86400 # Timeout for the web front end, default 24 hours opsview_rest_api_session_timeout_secs: 3600 # Timeout for the REST API token, default 1 hour
Then run as root:
For a more secure environment, set:
opsview_web_session_timeout_secs: 1800 # Web browser with 30 minute timeout opsview_rest_api_session_timeout_secs: 1800 # REST API should have the same value
We recommend you leave the
include_error_detail value to the default 0, as enabling this option can cause environmental information to be sent to users when a web application exception occurs.
Your Opsview Monitor server should be placed in a secure location. If your server is accessible through a public network, we recommend using a firewall to restrict access to various ports; see Ports.
If necessary you may need to whitelist the following URLs for Opsview purposes:
Ensure you do not have any proxies set up in your user environments that could affect the
opsview user. Proxies should be configured just within your package management software (such as
/etc/yum.conf on RHEL/CentOS or
/etc/apt/apt.conf.d/proxy on Debian/Ubuntu).
Opsview Monitor Agents are applications which run on a host to be monitored, and which will return status or performance metrics when requested. Agents can be contacted by the master or collector system (or clients) using certificates and ciphers to encrypt communication. Opsview Monitor agents only permit strong ciphers, such as ADH-128 and ADH-256 to be accepted (where the OS provides a suitable version of OpenSSL).
For additional security, we recommend using firewall rules to restrict which servers can connect to the agent. You can also use the ‘allowed_hosts’ variable in the agent configuration to limit connections to only the monitoring servers.
Opsview agents also support Secure Socket Layer (SSL) certificates. See Agent Security for more details.
Opsview Monitor has a feature called Security Wallet, which allows you to avoid storing clear text passwords for external systems in the filesystem and database. The user interface will not display any stored passwords and it will not be possible to retrieve any passwords once they have been set.
The master key file is stored in
/opt/opsview/coreutils/etc/sw.key and is randomly generated on installation.
If this key file is lost, then all passwords will need to be re-entered.
Where possible, plugins have been updated to hide any sensitive arguments in their command line.
Note: If you have any passwords stored in the Audit Log or any log files before an upgrade, these will not be altered. However, no passwords will be added in any new audit log entries.
Opsview Monitor allows specific arguments to be marked as encrypted. When this has been chosen, a message will appear to confirm that this is what you want to do.
When you save, the default arg value will be encrypted for the Variable object and all related Host attributes will be encrypted as well.
If you decide to mark the arg as unencrypted, then the argument will be cleared from this attribute and all related Host attributes. There is no way in the UI to recover these arguments.
On a new install of Opsview Monitor, we will set the Password argument of the following attributes to be encrypted:
On an upgrade, none of the existing Variable configuration will be encrypted. We recommend you manually convert the above Variable args to be encrypted.
For maximum security, we recommend you disable the uploading of Monitoring Plugins and Opspacks (the default for new installations) - see Advanced Automated Installation . You can still import Monitoring Plugins or Opspacks via the command line.
You can set the Linux/Unix server to any time zone; however, we recommend you set the time zone to be UTC. Opsview Monitor will show the time in the browser based on the browser's time zone.
All data that is stored in files and databases will be time stamped in UTC format for consistency.
If you have changed the time zone of your Linux/Unix server, you will need to restart your system so that all services are aware of the update.
The Opsview Monitor installation will create the
opsview system user and group if it does not already exist.
If you use an external provider for authentication, the
opsview user and group should be configured as a local user to remove the dependency on your external authentication provider.
The Opsview Monitor installation will update the .profile (or .bash_profile) in the user’s home directory to source
/opt/opsview/coreutils/bin/profile to set several Opsview Monitor environment variables.
Opsview Monitor uses four databases, as described in the table below:
Monitoring configuration and access control
Status data and short-term history
Long-term retention of data
State information about the Dashboard application
Both the opsview and runtime databases must be on the same server, whereas the odw and dashboard databases can be located on separate servers. In fact, some performance improvements can be achieved by locating the odw and dashboard databases on separate servers. For more information see section Databases on a Different Server.
New installations of Opsview Monitor will use randomly-generated passwords to connect to the databases. These passwords will be encrypted, so it will not be possible to decrypt these credentials to use to connect to the databases.
If you need to connect to the databases to run your own queries, we recommend that you create your own accounts for this purpose. Also, you can restrict the amount of information that would be available to this account, by limiting the tables that can be queried.
Opsview Monitor uses MariaDB for both RHEL7 and CentOS7 distributions, and MySQL is used for all other distributions for database storage. Opsview Monitor software supports the version of MySQL included as part of our supported Operating Systems (OSs). This ranges from MySQL v5.0, v5.1 through to v5.7 and includes MariaDB v5.5, v5.7 and v10.1; MySQL v5.6 is not currently supported. Similarly, if you use a remote database server, Opsview Monitor will only support the database server if it is based on one of our supported platforms.
Opsview does not support MySQL's strict mode. Ensure that the following is not set in MySQL's
It is recommended that you install MySQL prior to installing the Opsview Monitor software since you can then 'tune' your MySQL database server before Opsview Monitor creates its necessary databases.
Note: Make a note of your MySQL root password, as you will be prompted for this during the installation process.
Add these entries in the mysqld section of the
These options will cause transactional changes to be flushed to disk once per second, as opposed to on a 'per transaction' basis; see InnoDB Startup Options and System Variables , for more information. Finally, on a dedicated MySQL server, you should set the innodb_buffer_pool_size as large as possible, leaving approximately 15% free memory for the operating system; see below:
MySQLReport is a useful tool to evaluate the performance of MySQL. To tune MySQL, edit these values in the mysqld section of
/etc/mysql/my.cnf (OS dependent) and restart mysqld.
Good starting values for a small database server with 2-4GB of memory are:
- table_cache = 768 (check tables opened/sec in mysqlreport)
- query_cache_size = 16M (this should not be any higher due to limitations in mysql - see this post )
- key_buffer = 256M
- innodb_buffer_pool_size = 1024M
- innodb_file_per_table = 1
- innodb_flush_log_at_trx_commit = 2
- innodb_autoinc_lock_mode=1 # Required for replication with MySQL 5.1 or later
- max_allowed_packet = 16M
- binlog_format = 'MIXED' # when using binary logs in replication
- max_connections = 150
- tmp_table_size = 64M # To allow each connection to sort tables in memory. Maximum possible is max_connections x tmp_table_size
- max_heap_table_size = 64M # Set to the same as tmp_table_size
You can see the current values with
You may want to consider starting mysqld without name resolution. See http://dev.mysql.com/doc/refman/5.0/en/dns.html for more information.
You can use the
/opt/opsview/coreutils/installer/opsview_preupgrade_check mysql_variables script to see if there are any variables that need changing. Obviously, values will depend on the resources available on your server, so this acts only as a rough guideline.
Note: the recommendations for MySQL server variables depend on your system and what other services run on it, so you have to exercise judgement when changing your system. Make sure that you do not over-commit resources to MySQL because if it causes the server to go into swap space, this will reduce the performance of MySQL.
Note: the crashed tables check may take a while to run. More information about the innodb parameters are on the mysql documentation site.
iostat -x 5. This gives I/O statistics per disk. You could have a low overall I/O wait time, but it could be due to a single disk being used 100% of the time.
For maximum I/O, you should stripe the disks so that all disks are being utilized.
You should use separate disks for data files and index files - this improves read and write times.
You should use a fast disk for redo logs.
In this section, we describe how you can set up Opsview Monitor to use databases on a different server rather than exclusively running them on the Opsview Monitor master. As a result of undertaking this process, there will be an outage of the Opsview Monitor server during backup and restore. We recommend that you undertake a routine test to better understand how long the restore process will take prior to committing.
The database server only requires MySQL to be installed, along with its dependencies (the opsview-agent may be installed if you are not using SNMP to monitor the server). You should ensure that mysql is listening on the appropriate port and that other Opsview-specific configurations have been applied.
You will need to stop Opsview Monitor to achieve a consistent snapshot of the database and, as such, in the example below, we show you how to stop Opsview Monitor.
Note: All commands should be run the
opsview user unless otherwise stated.
sudo /opt/opsview/watchdog/bin/opsview-monit stop all
Here, we assume that you have undertaken a full database export; however, you may, of course, use any other application to back up your MySQL databases. You should be aware that if the new database server is located on a different architecture, that is, 32- or 64-bit, then you will need to export your database, as shown in the example below:
Now run the below command as root making sure to include any extra databases you may have (for example, include jasperserver if it exists). This will create a full database export.
mysqldump -u root -p --add-drop-database --opt --databases opsview runtime odw dashboard| gzip -c > databases.sql.gz
On your new server, restore the databases, as shown in the example below. You should also verify that your character set for the new databases are the same, as your previous version, since you may experience issues when upgrading.
On your new server restore your databases.
gunzip -c databases.sql.gz | mysql -u root -p
Note: Please refer to the Opsview Components documentation for how to update individual configuration files if you migrate your database.
/opt/opsview/coreutils/etc/opsview.conf file on the Opsview Monitor master server and update the file with the contents, as shown in the example below:
Update it with this content:
# # This file overrides variables from opsview.defaults # This file will not be overwritten on upgrades # $dbhost = "localhost"; $dbport = "3306"; $dbpasswd_encrypted = "redacted'; $odw_dbhost = "localhost"; $odw_dbport = "3306"; $odw_dbpasswd_encrypted = "redacted'; $runtime_dbhost = "localhost"; $runtime_dbport = "3306"; $runtime_dbpasswd_encrypted = "redacted'; $dashboard_dbhost = "localhost"; $dashboard_dbport = "3306"; $dashboard_dbpasswd_encrypted = "redacted'; $authtkt_shared_secret_encrypted = "redacted'; 1;
The values here are encrypted by default on new installations of Opsview Monitor.
You will need to replace all the
redacted entries iwth encrypted strings; to generate the encrypted string for a plaintext password, you can use the following command:
$ /opt/opsview/coreutils/bin/opsview_crypt Enter text to encrypt: ******** Encrypted value: fe4940b982ee95eb881c8269fa1b227c08b84dd71e286c6050682715ea11d818
Simply copy and paste this value and add within the quotation marks, i.e.
runtime_dbpasswd_encrypted = 'fe4940b982ee95eb881c8269fa1b227c08b84dd71e286c6050682715ea11d818';
Note: Both runtime ($runtime_dbhost) and opsview ($dbhost) need to be on the same host since this enables queries (joins) to be made across both databases.
You should also set up access controls with your new database server so that the Opsview Monitor master is allowed to connect. So, from the Opsview Monitor master run the command shown below:
/opt/opsview/coreutils/bin/db_mysql -t > opsview_access.sql
opsview_access.sql now contains all the necessary access credentials, which should be transferred to the new database and imported as follows:
mysql -u root -p < opsview_access.sql
For additional security, you may want to restrict access to the Opsview Monitor master only.
From the Opsview Monitor Primary Server, restart
opsview-web and regenerate the configuration for Opsview Monitor, as this will update third-party software applications with the new connection information, as shown below:
/opt/opsview/coreutils/bin/rc.opsview gen_config /opt/opsview/watchdog/bin/opsview-monit start opsview-web
NOTE: You may also need to restart any other Opsview - Component - Datastore you have moved or modified them.
Updated 4 months ago