cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2021
Views
0
Helpful
2
Replies

CWA\TES Alerts for connection back up

dcgriffin
Level 1
Level 1

So I'm wondering if there is any way to receive an email alert when a connection comes back up. We currently have alerts triggered when an agent or adapter goes down (Tidal lost connection with an agent/adapter) but have no way of telling if this is just a minor network blip, a server being rebooted, or if there is a more serious problem. If we could receive notification when it comes back up after a network blip we would know we don't have to go in and investigate. Even at a 5 or 10 min polling cycle we'd know if we didn't receive the connection back up after 5-10 mins we had better go in and look. If we get the email saying its back up then all is good. Any help would be appreciated. Thanks.

2 Replies 2

Cale Montgomery
Level 1
Level 1

I've managed something like this using jobs to monitor connection status in addition to the system alerts.  This seems to work in my test system. 

 

Create a job called "Monitor Server X".  One for each server to be monitored (though I'd like to find a way to make it more universal... see the final paragraph of this post for thoughts on that).  If you end up with a large number of them, place them in their own parent group for better organization.  The Agent/Adapter will be whichever server is being monitored, and it simply runs sleep.exe for long durations, and can be set to repeat as often as needed.  Something that constantly runs in the background.

 

Create a job event called "Connection Restored Monitor".  The trigger will be Job Orphaned -- a running job seems to enter this state when it loses its connection.  I've only executed this in test through manually enabling/disabling connections.  Live performance may differ.

 

Create a job action called "Connection Restored Monitor" which is launched by the "Connection Restored Monitor" event.  Use the "Inherit Agent from Job/Event that triggered this Action" option.

 

Create a job called "Connection Restored Monitor" which the action can launch.  It will be another sleep.exe job with a 1 second duration.  Associate that with an appropriate email event and action which is set to trigger on Completed Normally.  Use the <SysConnect> variable in the email to pick up on the varying agent of this job.

 

That should lead to the following flow...

 

The Server Monitor job is running constantly and goes into an Orphaned state when the agent is lost.  It will then launch a Connection Restored Monitor job using the same agent, which immediately goes into an "agent unavailable" state.  For the manual enabling/disabling of the agent, it was "Agent Disabled".

 

From my testing, the Connection Restored Monitor job will launch when the agent becomes available.  It completes after the 1 second sleep and then sends its email announcing that the <SysConnect> agent is back up.  The Orphaned Server Monitor job resumes as well.

 

Give that a shot.  I'm curious to see if it works in production.  It's possible that a System Event triggered by "Lost connection to agent/adapter" can also pass a triggering agent in the same way as the job event/action combo above.  The <SysConnect> variable seems to work fine.  If so, it eliminates the need for those Monitor Server X jobs.  I haven't been able to test that, however, as manually disabling a connection doesn't seem to trigger "Lost connection to agent/adapter".

Update -- yes, this can be done from a System Event triggered by Lost Connection to Agent/Adapter trigger.  I inserted the Connection Restored Monitor job action and then changed the machine name of one of the test connections to an invalid value.  The system event triggered on the loss of the agent and inserted the job, passing through the lost agent.

 

So to revise...

 

Create a job which runs as sleep.exe for 1 second.  Create a job action which launches that job, and use the "Inherit agent from Job/Event that triggered this Action" rather than a static agent.  Insert this job action into your agent monitor system event.

 

That sleeper job will go in as Agent Unavailable and wait.  When the agent comes back up, the job will run and complete normally, which is the trigger to use for sending out an email with <SysConnect> to alert required parties that the connection has been reestablished.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: