The Scheduler receives collection plans from the Orchestrator and uses these to schedule host and service checks, which are then run by the Executor. It is the single place in the system where state is calculated.
In addition, the Scheduler also:
- Calculates and sends notifications.
- Processes real-time actions, such as set-state or re-check.
- Calculates and processes flapping events.
- Processes acknowledgements.
- Manages BSM state (on the master Collector).
- Processes Downtime events.
- Manages the Automonitor process
The Scheduler is capable of propagating state dependencies across different Collectors and Clusters. This synchronisation mechanism operates via the Orchestrator.
If a new Scheduler starts up, it registers itself with the Orchestrator via the registry. This registration is refreshed every 20 seconds. If it is the first Collector in the system and not yet registered, the Orchestrator will automatically record and register it in the database. Otherwise, the Orchestrator will just record the Collector entry and mark it as active. In this case, the user will need to manually register the Collector.
If the Scheduler was started after a period of downtime, it waits for (by default) 10 seconds for the Orchestrator to detect the Scheduler and deliver a collection plan. If no plan is received, the Scheduler will attempt to load a previous plan from disk and start executing it.
If the Scheduler was just re-started, it sends a request to the Orchestrator for an updated collection plan. This is because the Orchestrator would not automatically detect a Scheduler re-start. The same (by default) 10 second wait period happens.
Although the Scheduler itself did not exist in Opsview 5.x, there are some differences in the overall behavior of the system:
*PROBLEMID macros: These are now UUIDs which are generated whenever an object moves into a problem state from the OK state. They are cleared when the object becomes OK again.
LONG(HOST|SERVICE)OUTPUT macros: Any trailing new-line is removed. For example, on Opsview 5.x, you may get "Line 2\nLine3\n", but in Opsview 6.x, you will just get "Line 2\nLine 3".
LASTSTATECHANGE is set on Opsview 6.x, but blank on Opsview 5.x.
There are differences with how check periods and timed exceptions work.
The Scheduler requires access to the MessageQueue, Registry and DataStore. Please make sure Messagequeue, Registry and Datastore are installed, configured and running before attempting to run the scheduler process.
Refer to Advanced Automated Installation.
The user configuration options should be set in "/opt/opsview/scheduler/etc/scheduler.yaml". Default values are shown in "/opt/opsview/scheduler/etc/scheduler.defaults.yaml", but changes should not be made here since the file will get overwritten on package update.
The following options can be set:
- local_message_queue: The connection/encryption to the local MessageQueue component.
- master_message_queue: The connection/encryption to the master MessageQueue component.
- collector_queue: The queue template on which to receive collection plans.
- execution_queue: The queue to send execution messages to.
- orchestrator_queue: The queue to send messages to request collection plans.
- results_queue: The queue to send results to. This is normally down via the Results-Sender.
- state_data: The local Datastore where state can be persisted.
- registry: The connection to the local Registry.
- logging: General logging config.
The Scheduler also manages the Automonitor Manager process; the log level of the Automonitor Manager can be configured within
/opt/opsview/scheduler/etc/scheduler.yaml by using:
automonitormanager: loggers: opsview: level: NOTICE
There are two further setting available for Automonitor:
scheduler: automonitor: workers: 10 concurrent_scans: 1
- workers - the number of hosts that may be inspected concurrently, default: 10
- concurrent_scans - the maximum number of scans that may be run at any one time on the scheduler, default: 1
Watchdog service files are now managed by the package, doing a remove would leave the watchdog service file behind with a .save extension. Purging the package will remove it. The package managed config files are as follows
Watchdog service files are now managed by the package. Any modifications will be saved at upgrade and remove processes with the .rpmnew and .rpmsave extensions correspondingly.
As root, start, stop and restart the service using:
/opt/opsview/watchdog/bin/opsview-monit <start|stop|restart> opsview-scheduler
The scheduler component should only be stopped or restarted via the watchdog as described in Service Administration. If the scheduler process is killed improperly via the command line (e.g. the
kill -9 command), it is possible for a subprocess to be orphaned, preventing the watchdog from restarting the component. This will show on the process list as a single “schedulerlauncher” process, rather than the correct process tree:
$ ps auxf | grep scheduler | grep -v grep opsview 22484 0.4 0.5 694528 47716 ? Sl 12:08 0:00 /opt/opsview/scheduler/venv3/bin/python /opt/opsview/scheduler/venv3/bin/schedulerlauncher
This results in Checks unable to be scheduled for execution. To fix, you must manually restart the
opsview-scheduler component as per Service Administration
Updated about 1 year ago