cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3768
Views
0
Helpful
8
Replies

HOW TO: Create email alert for Time Outs

jeff.hamway01
Level 1
Level 1

Hello,

 

Im having a difficult time creating a email alerts specifically for Time-Outs.  I'm not sure what 'Event Trigger' to use that will specifically alert on Time Out issues.  Anyone have any ideas? 

 

Details:  Using Tidal 6.2.1, When A job or job group time's out,  I want to send a email alert.  I already have alerts for things completing abnormally and also running longer than expected.  

8 Replies 8

John Laird
Cisco Employee
Cisco Employee
You should try the trigger "Job not ready by end of its time window".

John

Thank you for the very quick reply.  I thought about using the "Job not ready by end of its time window". event trigger but I don't think that is going to work.  See  below:

 

I have a job stream that runs from 6am - 12am

In the job stream is a PeopleSoft Job

The peopleSoft job recently timed out because it could not connect to the SAN and the Peoplesoft job timed out only.  other jobs did not time out

So the stream is still  running but the PeopleSoft job is in timeout status and the stream was still within its time window.

I wanted to alert specifically when there is a timeout even though the stream is within its timeout window. Thoughts?

If triggering an event on Job Stopped doesn't work, what does the job's Output look like when it goes to timeout?  In the Run tab of the job, you can set Scan Output: Abnormal Strings.  That would throw the job into Abnormal Completion if the output is useful.

Cale,

 

Yes I could use 'scan abnormal string' but we use the Exit Code for just about all the jobs.  by using the 'scan abnormal string',  i would have to adjust 90% my jobs to use the abnormal string to catch that.  

I do appreciate the feedback but not sure that would work.  I haven't found a good answer to this dilemma yet.  We currently use 'eyeball' technology to catch those ;). 

Okay, we don't want you making radical changes to output strings if you're reliant on them for normal processing.

 

The "Job Stopped" trigger isn't working.  I have to assume that the triggers of "Job Running Longer Than Its Maximum Time", and "Job Running Past Time Window" have both been considered and weren't working either.  Which is frustrating.  Either a job runs past its expected time, or it's stopped.  Has to be one of the two, but apparently not.

 

You could always replace the eyeballs with another job.  Create something which is timed to run at the point where your hanging job is supposed to have ended.  It will have a dependency on the hanging job which triggers on a status of Completed Normally, but uses the <> Operator.

 

I hope 6.2 uses the Operators on job dependencies.  It's been long enough since I've last used that version that I don't remember.  I'm also having trouble coming up with a complete Users Guide which lists all of the Job Dependency options.

 

The monitoring job will launch and then squawk if the hanging job has not completed normally by then.  It's a clumsy solution which will result in duplicate emails if the hanging job has decided to Complete Abnormally instead of going to Timed Out.  It would also require that you create a similar monitoring job for each hanging job with this problem.

 

 

Does 6.2 allow for both Job Events and Group's Events on parent job groups?  You could try adding an event on that level using the "Triggered After Job's Own Events" option.  If something about the Timed Out status is interfering with the hanging job's events, applying "Job Stopped", "Job Running Past Time Window", or "Job Running Longer Than Maximum Time" might recognize those conditions from the Group level.

 

Note that a parent's Job Events applies to just that parent Job Group.  A parent's "Group's Events" applies to each job contained within that job group.  It's in the Group's Events where you can set an event to trigger before or after a job's own event.

Oh. Hold on -- it's not connecting to the SAN -- does that mean the hanging job is not even launching correctly? It's certainly dying before the end of its timing window, meaning the "Job Running Past Time Window" is out anyway, sorry about that. I should have re-read carefully before replying.

If it's dying before hitting an Active state, then yeah. Job Stopped won't work. "Launch Error" might, but I've not worked with that enough to know for certain.

This might not even be tripping "Job Running Longer Than Maximum Time" if the job itself isn't considered to be running. "Timed Out" is documented in 6.3 as the job not being able to launch by the end of its date/time window, and is not scheduled to "Run Again Tomorrow" from the Time Window settings.

Which makes me wonder why the PeopleSoft stream is active if the job did not properly launch, but we're really only worried about the Timed Out status anyway.

So. Okay. So that does sound like triggers of "Job Not ready by Start Time" could be considered, either on the job itself or in the parent's "Group's Events" tab. That's still iffy, though, as events have not been working for this issue anyway.

The job itself should also not be going to Timed Out until it hits the end of its window. We don't have Timed Out as an actual trigger for anything. But the idea of a second job checking the status of the hanging job would still work.

Find a point in time where you're certain that the hanging job should have completed, or least should have entered an Active state. That might be the end of the time window, or some other point in the job flow.

The Monitor job will have two dependencies on the hanging job. One that allows it to run if the PeopleSoft job is <> Active, and another if it has <> Completed Normally.

The Monitor job has its own event where it sends out an email when it completes normally.

So let's say that the PeopleSoft job should have compelted by 8:00pm. Monitor launches at 8:00pm, and then completes normally if PeopleSoft is not Active, and is not Completed Normally. Monitor then itself completes normally and sends out an email.

Set Monitor with a closing window of 8:01 so that it doesn't just hang out in Waiting on Dependency all day.

Man. That is really clumsy, but it should at least work and prevent you from having to stare at it all day. Right now, I'm hoping that a Job Event applied to a parent on the "Group's Events" tab will work with a little experimentation.

duplicate to previous post.

abriolitto50
Level 1
Level 1