Known Issues
An overview of the known issues in this release of Opsview Monitor
Overview
The following issues have been identified to exist within this release of Opsview Monitor:
OS specific
Ubuntu 18
Email notifications will cease to work and need a new package installed
In Ubuntu18, the notify_by_email Notification Method will fail to work due to /usr/bin/mail being deprecated and replaced with /usr/bin/s-nail.
Installing bsd-mailx resolves the issue.
apt install bsd-mailx
Ubuntu 20
Re-enabling TLS 1.0 and TLS 1.1
Ubuntu 20 ships by default with TLS 1.0 and TLS 1.1 disabled. This means you may get errors when any OpenSSL libraries try to connect to external services.
Ideally, the external service should be upgraded to support TLS 1.2, but if that is not possible, then you can re-enable TLS 1.0 and TLS 1.1. Note, by doing this, you are reducing security.
To test the external service:
openssl s_client -connect SERVER:443 -tls1_2
This will fail if the external service does not support TLS 1.2.
To allow Ubuntu 20 to use TLS 1.0, edit /etc/ssl/openssl.cnf
and add this at the top:
openssl_conf = openssl_configuration
Then add this at the bottom:
[openssl_configuration]
ssl_conf = ssl_configuration
[ssl_configuration]
system_default = tls_system_default
[tls_system_default]
MinProtocol = TLSv1
CipherString = DEFAULT:@SECLEVEL=1
Now check that connections will work.
Non-RHEL8
SNMP Polling
SNMP Polling Checks do not support the ‘aes256’ and ‘aes256c’ SNMPv3 privacy protocols when run on non-RHEL8 collectors. You may see an UNKNOWN state and an error message starting with the following if this is attempted:
External command error: Invalid privacy protocol specified after -x flag: aes256
or
External command error: Invalid privacy protocol specified after -x flag: aes256c
See SNMP Privacy Protocol Support for further details.
SNMP Traps
SNMP Traps being sent using the ‘aes256’ and ‘aes256c’ SNMPv3 privacy protocol options will not appear if received by non-RHEL8 collectors (see SNMP Privacy Protocol Support).
OS Generic
UI
- [6.7.5 and below] Once a User Role has been authorised for a Host Group, removing the authorisation later does not prevent them from viewing Hosts within the group in the Navigator or Network Topology maps.
- [6.7.5 and below] Network Topology maps do not allow a User Role with the VIEWALL permission to view all Hosts unless the Role is additionally enabled for "All host groups" and "All Service Groups" in the "Status Objects" permissions tab.
Upgrade/Installation
opsview-deploy
package needs to be upgraded before runningopsview-deploy
to upgrade an Opsview Monitor System.- Changing the flow collectors configuration in Opsview Monitor currently requires a manual restart of the flow-collector component for it to start working again.
- At upgrade, the following are not preserved:
- Downtime: we recommend that you cancel any downtime (either active or scheduled) before you upgrade/migrate. Scheduling new downtime will work fine.
- Flapping status: the state from pre-upgrade/migration is not retained but if the host/service is still flapping, the next checks will set the status to a flapping status again.
- Acknowledgements: at the end of an upgrade/migration, the first reload removes the acknowledgement state from hosts and services. Any further acknowledgement will work as usual.
- If you use an HTTP proxy in your environment, the TimeSeries daemons may not be able to communicate. You can work around this by adding
export NO_PROXY=localhost,127.0.0.1
environment variable (note: this is in upper case, not lower case) to theopsview
user.bashrc
file - Hosts and services in downtime will appear to stay in downtime even when it is cancelled. You can work around this issue by creating a new downtime, wait until it starts and then cancel it, or add a downtime that lasts only for 5 minutes, and let it expire naturally
- On rare occasions
opsview-messagequeue
may occasionally fail to upgrade correctly when runningopsview-deploy
. See MessageQueue Troubleshooting for steps to resolve the issue - The
sync_monitoringscripts.yml
playbook fails to execute whenever the SSH connection between the host where opsview-deploy is being run and the other instances is reliant on a user other than root and we only define the private SSH key using the ansible_ssh_private_key_file property in opsview_deploy.yml. This happens because the underlying rsync command is not being passed the private SSH key and thus fails to connect to the instances. To work around this issue add, in the root SSH configs. Consider the following example:
# If you use ansible_ssh_private_key_file on the opsview_deploy.yml file
(...)
collector_clusters:
cluster-A:
collector_hosts:
ip-172-31-9-216:
ip: 172.31.9.216
user: ec2-user
vars:
ansible_ssh_private_key_file: /home/ec2-user/.ssh/ec2_key
ip-172-31-5-98:
ip: 172.31.5.98
user: ec2-user
vars:
ansible_ssh_private_key_file: /home/ec2-user/.ssh/ec2_key
(...)
# You need to add the following entries to /root/.ssh/config
Host ip-172-31-9-216 172.31.9.216
User ec2-user
IdentityFile /home/ec2-user/.ssh/ec2_key
Host ip-172-31-5-98 172.31.5.98
User ec2-user
IdentityFile /home/ec2-user/.ssh/ec2_key
Plugins
- There is no automated mechanism in this release to synchronize scripts between the Opsview Monitor Orchestrator and Collector Clusters. A
sync_monitoringscripts.yml
deploy playbook is provided to fulfil this purpose but it must be run manually or from cron on a regular basis. - check_wmi_plus.pl may error relating to files within your
/tmp/*
directory due to the ownership of these files needing to be updated to the Opsview user. Seen when upgrading from an earlier version of Opsview, as the nagios user previously ran this plugin.
Modules support
- SMS Gateway is not available in this release. If you rely on this method, please contact Support.
Collectors and clusters
- Despite the UI/API currently allowing it, you should not set parent/child relationships between the collectors themselves in any monitoring cluster; collectors do not have a dependency between each other and are considered equals.
- When trying to Investigate a host, if you get an Opsview Web Exception error with "Caught exception in Opsview" message, this could be an indicator that the Cluster monitoring for that host has failed and needs you to address it.
Database changes
- All database users created by Opsview will use the mysql_native_password authentication plugin (for MySQL 8, the default is usually caching_sha2_password)
- The nightly backups of the opsview and runtime database are now based on the MySQL server’s preferred format, rather than a mysql40 compatible mode
- When using utf8mb4, the collation difference from latin1 means some rows may come back in a slightly different order (eg, for latin1 check_snmp_weblogic_jmsqueuelength, check_snmp_weblogic_jsm_dests but utf8mb4 will be the other way round)
MySQL RPM Repository Key
The MySQL RPM Repository Key stored within the product has expired. This has been fixed in a later version of Opsview Monitor, but it can be amended locally without upgrading.
For APT based systems, edit /opt/opsview/deploy/lib/roles/opsview_database/vars/apt.yml
on the Orchestrator, search for the line repo_key_id
and amend as follows:
mysql:
...
repo_key_id: 3A79BD29
For RPMO based systems, edit /opt/opsview/deploy/lib/roles/opsview_database/vars/yum.yml
on the Orchestrator, search for the line repo_key_id
and amend as follows:
mysql:
...
gpgkey: http://repo.mysql.com/RPM-GPG-KEY-mysql-2022
REST API
- REST API config/OBJECT list calls: The ordering of results when using MySQL 8 is not necessarily deterministic, so REST API calls may need to specify a subsort field. Eg: for hosts, order=hostgroup.name is not sufficiently deterministic and will need to be order=hostgroup.name,id so that the results come back in a fixed order.
Backups
- [6.7.8 and above] If the "Opsview - Daily Backups" self-monitoring Service Check reports a CRITICAL exit code 1 for the backup, and the backup file exists on disk correctly in
/opt/opsview/coreutils/var/backups/
, then the Check output can be ignored. This issue is fixed in Opsview 6.8.2 and above. - Backups can report an error relating to a missing RELOAD MySQL privilege if using the latest MySQL 5.7 or 8.0. This is fixed in Opsview 6.8.2 and above.
- If restoring from the Audit Log returns the message “Restore failure: A restore is already in progress” incorrectly, then a previous restore attempt may have failed to finish. Issues found will be logged in
/opt/opsview/coreutils/var/log/db_restore.log
. To force the system to allow a new restore attempt, delete the file/opt/opsview/coreutils/var/restore_in_progress.lock
. This is fixed in Opsview 6.8.3 and above. - If manually restoring a backup of the
runtime
DB results in the UI returning a "Web Exception", run the following command to resolve:sudo /opt/opsview/coreutils/utils/cx runtime 'INSERT INTO opsview_metadata VALUES (\"postreload\", UNIX_TIMESTAMP())'
. This is fixed in Opsview 6.8.3 and above.
Other Issues
- [6.7.5 and below] Once a User Role has been authorised for a Host Group, removing the authorisation later does not prevent them from viewing Hosts within the group in the Navigator or Network Topology maps.
- [6.7.5 and below] Network Topology maps do not allow a User Role with the VIEWALL permission to view all Hosts unless the Role is additionally enabled for "All host groups" and "All Service Groups" in the "Status Objects" permissions tab.
- There is no option to set a new Home Page via the UI yet. For new installations, the Home Page is set to the
Configuration > Navigator
page. - Start and End Notifications for flapping states are not implemented in this release (when a Host or Service are flapping all notifications will be suppressed)
- Deploy cannot be used to update the database root password. Root user password changes should be made manually and the
/opt/opsview/deploy/etc/user_secrets.yml
file updated with the correct password. - When a Host has been configured with 2 or more Parents and all of them are DOWN, the Status of the Services Checks on the host is set to CRITICAL instead of UNKNOWN. Consequently, the Status Information is not accurate either.
- If an Opsview Monitor system is configured to have UDP logging enabled in
rsyslog
, RabbitMQ will log atINFO
level messages to opsview.log and syslog with a high frequency - 1 message every 20 seconds approximately. - Some components such as opsview-web and opsview-executor can log credential information when in Debug mode.
- When running an Autodiscovery Scan via a cluster for the first time there must be at least one host already being monitored by that cluster. If the cluster does not monitor at least one host, the scan may fail with this message: "Cannot start scan because monitoring server is deactivated".
- When running an Autodiscovery Scan for the first time after an upgrade, it may fail to begin and remain in the Pending state. To resolve this, simply restart the opsview-autodiscoverymanager component on the Opsview Master Server (orchestrator). After the component has restarted successfully, the scan will start.
- You may get occasional errors appearing in syslog, such as:
Nov 28 16:31:50 production.opsview.com opsview-datastore[<0.6301.0>] req_err(2525593956) unknown_error : normal#012
[<<"chttpd:catch_error/3 L353">>,<<"chttpd:handle_req_after_auth/2 L319">>,<<"chttpd:process_request/1 L300">>,
<<"chttpd:handle_request_int/1 L240">>,<<"mochiweb_http:headers/6 L124">>,<<"proc_lib:init_p_do_apply/3 L247">>]
# You can ignore them as there is no operation impact.
- In order to get the SNMP Traps working on a hardened environment the following settings need to be changed:
# Add the following lines to /etc/hosts.allow
snmpd:ALL
snmptrapd:ALL
# Add the following lines to hosts.deny
snmpd: ALL: allow
snmptrapd: ALL: allow
- Using
Delete All
on the SNMP Traps Exceptions page may sometimes hide new ones as they come in. They can by viewed again by changing the 'Page Size' at the bottom of the window to a different number. - CPU utilization is sometimes high due to the datastore.
AutoMonitor
- When an AutoMonitor Windows Express Scan is set with a wrong, but still reachable, Active Directory Server IP or FQDN, the scan could remain in a "pending" state until it times out (1 Hour default value). This means that no other scans can run on the same cluster for that period of time. This is due to PowerShell not timing out correctly.
- Automonitor automatically creates the Host Groups used for the scan:
Opsview > Automonitor > Windows Express Scan > Domain
. If any of these Host Groups already exist elsewhere in Opsview Monitor, then the scan will fail. If one of the Host Groups is moved then it should be renamed to avoid this problem. - If you have renamed your
Opsview
top level Host Group, the Automonitor scan will currently fail. You will need to rename this or create a newOpsview
Host Group in order for the scan to be successful - Automonitor application on logout will clear local storage - this means that if a scan is in progress and a user logs out, when the user logs in they won't see that scans progress even if it's still running in the background
- Any services already in dependency failure before upgrading to this release will not return to their previous state when leaving dependency failure, since that state will not have been saved. They will remain down until the next check occurs, as per the existing behaviour. However, any services that go into dependency failure after the upgrade has completed will follow the new recovery behaviour, as documented in Important Concepts .
Opspacks
- Due to changes made to the
Windows Active Directory
Opspack, Windows hosts must now have a version of Powershell equal to or higher than version 5.0. - Due to the same Active Directory Opspack changes, setup-opsview.yml must be re-run to import the new Opspack plugin changes.
-
- A reload must also be carried after to propagate the argument changes through to the collection plan for the Scheduler(s).
Windows Active Directory
Opspack checks may increase CPU usage on the target Windows servers when running checksWindows WMI - Base Agentless - LAN Status
Servicecheck: Utilization values for Network adaptors byte send/byte receive rates are around 8 times lower than expected. Therefore, Warning and Critical thresholds should be adjusted accordingly as a workaround. See Plugin Change LogCloud - AWS
related Opspacks: The directory/opt/opsview/monitoringscripts/etc/plugins/cloud-aws
, which is the default location for aws_credentials.cfg file, is not created automatically by Opsview. Therefore, it needs to be created manually.- If
opsview_tls_enabled
is set tofalse
, theCache Manager
component used by Application - Kubernetes and OS - VMware vSphere Opspacks will not work correctly on distributed environments - 'Hardware - Cisco UCS'. If migrating this Opspack over from an Opsview v5.x system it may produce error
Error while trying to read configuration file
orFile "./check_cisco_ucs_nagios", line 25, in <module> from UcsSdk import * ImportError: No module named UcsSdk
.
If this is seen then running the following will resolve the issue
# as root
wget https://community.cisco.com/kxiwq67737/attachments/kxiwq67737/4354j-docs-cisco-dev-ucs-integ/862/1/UcsSdk-0.8.3.tar.gz
tar zxfv UcsSdk-0.8.3.tar.gz
cd UcsSdk-0.8.3
sudo python setup.py install
Place config file 'cisco_ucs_nagios.cfg' into the plugins path /opt/opsview/monitoringscripts/plugins/
.
Opsview - Login
is critical on a rehomed system. Resolve this by adding an exception to the Servicecheck on the Host specifying/opsview/login
as the destination rather than/login
.
Unicode Support
- While inputting non-UTF-8 characters into Opsview Monitor will not generate any problem, the rendering of those characters in the user interface may be altered in places such as free text comments.
SNMP Traps
SNMPTraps daemons are started on all nodes within a cluster. At start-up a 'master SNMP trap node' is selected and is the only one in a cluster to receive and process traps. Other nodes silently drop traps.
The majority of SNMPTrap sending devices can at most send to 2 different devices.
The current fix is to manually pick two nodes in a given cluster to act as the snmp trap and standby node. Then mark all other nodes within the cluster to not have the trap daemons installed, for example
collector_clusters:
Trap Cluster:
collector_hosts:
traptest-col01: { ip: 192.168.18.53, ssh_user: centos }
traptest-col02: { ip: 192.168.18.157, ssh_user: centos }
traptest-col03: { ip: 192.168.18.155, ssh_user: centos, vars: { opsview_collector_enable_snmp: False } }
traptest-col04: { ip: 192.168.18.61, ssh_user: centos, vars: { opsview_collector_enable_snmp: False } }
traptest-col05:
ip: 192.168.18.61
ssh_user: centos
vars:
opsview_collector_enable_snmp: False
On a fresh installation the daemons will not be installed.
On an existing installation the trap packages must be removed and the trap demons on the 2 active nodes restarted to re-elect the master trap node
# INACTIVE NODES:
CentOS/RHEL: yum remove opsview-snmptraps-base opsview-snmptraps-collector
Ubuntu/Debian: apt-get remove opsview-snmptraps-base opsview-snmptraps-collector
# ACTIVE NODES:
/opt/opsview/watchdog/bin/opsview-monit restart opsview-snmptrapscollector
/opt/opsview/watchdog/bin/opsview-monit restart opsview-snmptraps
Opsview Reporting Module
On upgrade to the the latest version of Reporting Module, email settings will need to be reapplied.
Email configuration can be found in the file: /opt/opsview/jasper/apache-tomcat/webapps/jasperserver/WEB-INF/js.quartz.properties
To configure email, edit the following lines in the configuration to match your required configuration.
Example configuration for Internal email:
report.scheduler.mail.sender.host=localhost
report.scheduler.mail.sender.username=admin
report.scheduler.mail.sender.password=password
report.scheduler.mail.sender.from=admin@localhost
report.scheduler.mail.sender.protocol=smtp
report.scheduler.mail.sender.port=25
Example configuration for SNMP relay:
report.scheduler.mail.sender.host=mail.example.com
To apply changes, you will need to restart opsview-reportingmodule:
/opt/opsview/watchdog/bin/opsview-monit restart opsview-reportingmodule
- When accessing any url under
/jasperserver
on an Opsview system without a valid session, a 401 response is returned rather than a redirect to the login page. Users must navigate to the login page manually and login again in this case. - Running reports within the Jaspersoft Studio IDE when connected to the Opsview Reporting Module currently results in a 401 error. See Running Reports in Jaspersoft Studio for alternatives.
- Using Jaspersoft Studio when connected to the Opsview Reporting Module can quickly use up available sessions. See Session Manager Config for Jaspersoft Studio for mitigation details.
Updated 8 months ago