Cache Manager

Description

Opsview Cache manager is a fast, distributed in-memory key-value store, which can be used to cache session and other disposable data with a short shelf life. As the storage is only in memory, restarting the Cache Manager will potentially lose previously stored cache data and the components making use of this will then need to re-populate that data. The data is also shared between all nodes within the same Collector Cluster, meaning a check run on one node can access data previously stored on another.

The Cache Manager has an API available to client components with both HTTP and HTTPS endpoints. It also uses this API to communicate with other peers within the cluster to implement the distributed cache.

Package Dependencies

  • opsview-python3.

Service Dependencies

  • opsview-registry

Installation

The component is deployed on Collectors by Opsview Deploy. The specific playbook is cache-manager-install.yaml.

Configuration

Specific configuration options should be set in /opt/opsview/cachemanager/etc/cachemanager.yaml. Opsview Deploy will automatically configure Cache Manager peers within the cluster.

The following settings can be managed, but it is strongly recommended that they managed via Opsview-Deploy to ensure that they are synchronized within each cluster, otherwise the Cache Manager may not function correctly.

SettingDescription
worker_timeoutThe time a worker remains dormant after an error before re-starting (default 30s).
max_cache_sizeThe maximum size of the local cache (default = 1GB, 0B = no limit).
max_item_sizeThe maximum size of an individual cache entry (default 10MB, 0B = no limit).
peer_refresh_timeThe time delay before peers marked as "down" will be rechecked.
peer_connection_timeoutThe timeout value when attempting communication with a peer.
timestamp_error_marginThe max time difference allowable when validating an encrypted namespace.
cache_purge_timeThe time delay before expired items are actively purged from the cache.
http_serverThe settings for the HTTP endpoint (including namespace encryption).
https_serverThe settings for the HTTPS endpoint (including namespace encryption). The encryption must match that specified in the http_server. The client certificate, ca certificate and server certificate should also be specified here. (Generally automatically setup by Opsview Deploy)
registryThe connection to Opsview Registry. (Generally automatically setup by Opsview Deploy)
peer_nodesThe other Cache Manager nodes in the cluster. (Generally automatically setup by Opsview Deploy)

Management

Start/Stop cachemanager

# Once configured on the machine you need to run the following command to get it working
sudo /opt/opsview/watchdog/bin/opsview-monit reload
sudo /opt/opsview/watchdog/bin/opsview-monit start opsview-cachemanager
# To stop the cachemanager
sudo /opt/opsview/watchdog/bin/opsview-monit stop opsview-cachemanager

Validate cache manager status

Get status of cache manager

curl http://127.0.0.1:8182/status -s | python -m json.tool
{
    "available_peer_refs": [
        "1d20de92-12bd-4067-8563-8d63e83429cc"
    ],
    "cache_items": 0,
    "peers": [
        {
            "deactivated": 0,
            "ref": "1d20de92-12bd-4067-8563-8d63e83429cc"
        }
    ],
    "ref": "1d20de92-12bd-4067-8563-8d63e83429cc",
    "uptime": 4581
}

Logging

Logs to syslog as per all other components

Example logging:

Oct 14 11:13:11 cdb-cachemanager journal: opsview : [NOTICE] Launching Opsview Cache-Manager daemon.
Oct 14 11:13:11 cdb-cachemanager journal: opsview.cachemanager.cachemanagerworker : [NOTICE] Worker started (PID=15807).

Using the Cache Manager

How does the Cache Manager work?

Opsview Deploy, by default, will install one Cache Manager per Collector.
Each Cache Manager is then connected to the Executor and Scheduler via the Load Balancer.
When plugins are executed, they make use of their local Cache Manager.
The local Cache Manager can then communicate with any remote Cache Managers in the same cluster and share data where necessary.

595595

API Methods

The Cache Manager is based on two main API methods, get_data and set_data.

MethodDescription
set_dataInput a piece of data into the Cache Manager given a key, namespace and TTL
get_dataRetrieve data from the Cache Manager given a key, namespace and maximum wait time.

The Cache Manager assigns data to a unique combination of key and namespace.

The namespace ensures naming collisions are avoided and potentially sensitive data cannot be read by other unauthorized plugins.

The key is the identifier for the data in the namespace specified. Data will be valid for the time specified by the TTL (default is 15 minutes).

When getting data, Cache Manager will wait a maximum amount of time for "The Lock".

What is "The Lock"?

  • If the data does not exist in the Cache Manager, it will return a lock.
  • Obtaining a lock means the Cache Manager expects the component to make the call to get the data directly and then use the set_data method to set the data in the Cache Manager, ready to be used by other components.
  • Any concurrent components calling the get_data method will block if they cannot obtain the lock, this ensures that only one component sets the data.
  • Once the data has been set, all blocked components will be unblocked and return the newly cached data.
  • The max_wait_time parameter of the get_data method has a default of 30 seconds but needs to be large enough for this cycle to be completed.

What happens if a Cache Manager goes offline?

It depends on where the data was stored initially and how many Cache Manager's you have in your clustered environment.

  • With one Cache Manager, you will lose the data stored and the plugin will have to make the API call again.
  • With two or more Collectors/Cache Managers, where both collectors have made a query to the same data, once the Cache Manager comes back online it will be able to retrieve the data from it's remote Cache Manager, given the data has not expired.
  • With two or more Collectors/Cache Managers, but the remote Cache Manager/s did not make a query to the data, you will lose the data stored and the plugin will have to make the API call again.

Please note, the Cache Manager is not a synchronised data store between collectors. You will only keep the data if both Cache Managers made a call for the same data, prior to the secondary Cache Manager shutting down.

Building Plugins that use Cache Manager

Cache Manager is now fully integrated with plugnpy. You can import plugnpy into your Python monitoring script and make full use of all the Cache Manager's features.

plugnpy comes with an HTTP client which is able to connect to the opsview-cachemanager component. This allows the plugins to use the Cache Manager to store temporary data into memory which can be consumed by other service checks which require the same data.

The module consists of two classes, namely CacheManagerClient and CacheManagerUtils, which provide easy to use interfaces to communicate with the opsview-cachemanager.

CacheManagerClient

A simple client to set or get cached data from the Cache Manager.

The Cache Manager client requires the namespace of the plugin and the host IP address and port number of the Cache Manager to be supplied. These are provided to the plugin as opsview-executor encrypted environment variables, or you can just specify them manually.

Optionally, when creating the client, the concurrency, connection_timeout and network_timeout parameters can be specified to modify the number of concurrent HTTP connections allowed (default: 1), the number of seconds before the HTTP connection times out (default: 30) and the number of seconds before the data read times out (default: 30), respectively.

client = CacheManagerClient(host, port, namespace, concurrency=1, connection_timeout=30,
                            network_timeout=30)

Once a Cache Manager client has been created, the get_data and set_data methods can be used to get and set data respectively.

  • The set_data method can be called with the key and data parameters, this will store the specified data, under the given key.
  • Optionally, the ttl parameter can be used to specify the number of seconds the data is valid for (default: 900). It is expected that session information and other temporary data will be stored in the Cache Manager.
  • 15 minutes has been chosen as the default to ensure data does not have to be recreated too often, but in the event of a change in data, the cached information does not persist for too long.
  • Optionally, the max_wait_time parameter can be used to specify the number of seconds to wait before timing out (default: 30).
 client.set_data(key, data, ttl=900)
  
# The get_data method can be called with the key parameter to retrieve data stored under the specified key.  #
  
client.get_data(key, max_wait_time=30)

CacheManagerUtils

To simplify calls to the Cache Manager, plugnpy provides a helper utility method get_via_cachemanager, this will create the Cache Manager client and call the get_data and set_data methods as required.

This method expects five parameters:

  • no_cachemanager: True if Cache Manager is not required, False otherwise.
  • key: The key to store the data under.
  • func: The function to retrieve the data, if the data is not in the Cache Manager.
  • args: The arguments to pass to the user's data retrieval function.
  • kwargs: The keyword arguments to pass to the user's data retrieval function.
 def api_call(string):
  return string[::-1]
 
CacheManagerUtils.get_via_cachemanager(no_cachemanager, 'my_key', api_call, 'hello')
  • In this example, if the data exists in the Cache Manager under the key 'my_key', the call to get_via_cachemanager will simply return the data.
  • However, if the data does not exist in the Cache Manager, the call to get_via_cachemanager will call the api_call method with the argument 'hello' and then set the data in the Cache Manager, so future calls can use the data from the Cache Manager.

Debugging

To see what is going on inside the Cache Manager as you are making calls, you must start it in debug mode.

# Stop the Cache Manager
sudo /opt/opsview/watchdog/bin/opsview-monit stop opsview-cachemanager
  
# Start the Cache Manager in debug mode using -d flag
sudo -iu opsview /opt/opsview/cachemanager/venv/bin/cachemanagerlauncher -d
  
Oct 14 11:13:11 cdb-cachemanager journal: opsview : [NOTICE] Launching Opsview Cache-Manager daemon.Oct 14 11:13:11 cdb-cachemanager journal: opsview.cachemanager.cachemanagerworker : [NOTICE] Worker started (PID=15807).