The master looses connection to the agent not the other way around.
- We use SCOM (replace with whatever tools is appropriate to your platform) to monitory that the Tidal master services and Tidal failover services themselves do not go down.
- Keep in mind you'll need to work with your mail administrator to add your Tidal servers into an SMTP mail message allowed if working with Exchange.
Two system events you should define for every Tidal master are:
1) se public connection status - one action email followed by an alert action (Security Level=Warning) :
An error alert has been generated for an unplanned connection outage:
Job Connection Name: <SysConnect>
Job Agent: <JobAgent>
2) se admin failover complete - define a system event on Backup master took over with an action email associated titled "Failover Notice on <SysDate.M/d/yyyy>" with message as follows:
The primary Tidal master has failed over to the backup Tidal master.
Before the end of the next business day, investigate the reason and as soon as practical restore service to use to the primary Tidal master.
For our connection messages down, we use something like this:
Subject=FYI: Connection Down: <SysConnect> on <SysName>
The Tidal connection mentioned in the subject line has gone down, please investigate.
Possible Root Causes:
1) maintenance was performed without a defined Tidal outage when a server or service was brought down.
2) server or service is down ( remote into your server to verify if the server is down, is so, escalate with Infrastructure ext ZZZZ rather than with our Tidal Administrators )
Take appropriate action contacting a Tidal Administrator or Infrastructure Administrator as necessary if the connection remains inactive for more than 15 minutes.