WatchDog Project

The WatchDog Engine project was started in response to a customer request for some type of FSN engine monitoring. Because of the complexity of the FSN engines, they would occasionally stop or lock up and the issue could never be duplicated or reproduced. The WatchDog engine was suggested as a way to get an alert after a pre-determined set of conditions were met, using the AutoTask API to generate a trouble ticket. This way the engine could be restarted before it's stoppage caused any data processing issues.
General Info
The Watchdog is a monitoring system that monitors the health of each of the 5 FSN engines, and in the case of the Contract and Misc engines, passes additional processing information on to the UI for the user to view. There is no control of any of the engines through the watchdog, only metrics are gathered.
The information included in the display includes last engine poll time, last log entry time and engine status. For the Contract and Misc engines, additional information is included that describes what each engine is processing as well as some metrics as to how fast processing is being done. A detailed explanation of each is below.
The watchdog engine can, if configured to do so, generate AutoTask tickets automatically to alert Answers Systems of potential problems before the engine is offline for too long. One such trigger exists when an engine goes offline and the watchdog is unable to communicate with it any longer. In this case, there is a 2 hour delay, which is configurable, before an AutoTask ticket is created. The other trigger that creates a ticket is when both the poll time and log times are flagged as being out of spec. In this case, a ticket is immediately created to alert Answers to the potential problem. The max log and max poll times are both configurable on a per engine basis and are displayed with each engine.
Communication with the Watchdog from each engine is done asynchronously, through an engine monitor. This allows updates to be made to engine statuses between Watchdog poll times, which are every 30 seconds, by default. Communication with the app is also asynchronous, and is done through a WatchdogUI. This WatchdogUI allows values to be displayed in the app at any time, regardless of whether they have just been updated, or are a couple minutes old.
The following shows the new WatchDog display.
- Engine name.
- Last update.
- Maximum allowable time for no log updates before alarm is sent.
- Maximum allowable time for no poll before alarm is sent.
- Visual status indicator.
- Visual status indicator of past 2 polls.
- Visual status indicator of overall engine health.
- Poll time.
- Poll time.