I am seeing regular disconnect messages to the windows server agents in the log in both my Tidal environments. The disconnect usually is brief (1-5 seconds). So far job execution has not been disrupted, but I would like to find out what is causing these. Has anyone seen similar issues and can suggest where to begin troubleshooting?
Bill, We faced same issues, and that impacted some of jobs as well. Which ended in error says "Exception getting job result" etc. We escalated this issue with our OS team, and i think they made some changes to registry entries, seems that solved our issue. We rarely facing that issue now.
One more case like, lets say, if your Windows and TIDAL not on same Network/LAN, then you will see such problems, which purely based on the service provider of network, which i we don't have control over.
Thanks for the reply,
All our servers are on the same network. What I have noticed is the disconnects are at regular 55 minute intervals for each remote agents. We rebooted the Primary master for MS patching saturday, and the disconnects stopped while the backup master was running as the primary. Once the backup master was patched and rebooted Sunday, the primary took over again and the disconnects resumed.
I have engaged our OS team as well, do you recall what registry changes the OS team had to make? Is it related to a security or policy setting?
I have a case open with support as well
Update - no word on the progress for bug CSCuo75918, but I did see another posting on this site that suggests the license file may be the source:
We are still experiencing this issue after each of our monthly maintenance periods. If you could provide more details on the registry entries, we would truly appreciate it.
This happened to us in January and it's happening again right now. Here is the email we received from support when this happened the first time. I just opened a new case because I want to know what causes it:
The problem is that the Agent ID in master memory does not match what is in the database and how the master registered itself on the agent machine. Try these below steps and you should be good to go.
1. Stop the Tidal Agent.
2. Delete the TIDAL_AGENT_1 temp directory.
3. Assuming that you are running on PM: Stop and restart the BM service (This is to
clear out anything that may be in memory).
4. Failover to BM.
5. Stop and restart the PM service.
6. Restart the Tidal Agent. The temp directory will recreate.
7. Verify connectivity.
Posting an update:
I did try the suggested procedure to delete the Tidal_agent_1 temp files on the remote agents and restart, this had no affect on the 55 minute disconnects. I still have a case open with support and we are still working on it. Most recently I sent support Wireshark captures from the master during a disconnect.