Email alert for jobs that stuck in launched state

sakshirajawat · ‎10-06-2013

Is there any email alert that triggers when jobs get stuck in launch state for more than 5minutes.

I know about the "Job launched" event that would trigger when every job goes into launch state before active, but I need to have an email triggered for jobs that get stuck in launched for long.

Will the event "job not ready by start time" be of any use here??

Carolanne Fougerat · ‎10-07-2013

In 5.3 we have a few alerts generated from querying the database directly - this is one of them.

Other ones have to do with compile count threshold and other that query to msglog for specufic errors like SMTP etc.

You just need to query the jobmast and jobrun , status jobrun_status = 50 ( for launched)

then use the Jobrun.JOBRUN_STACHGTM, JOBRUN_LaunchTM and other fields that apply to your needs etc

The documentation comes with a chm file that has database dictionary describing tables and fields.

Steve Atwood · ‎10-27-2013

The event "job not ready by start time" will not be of any use in this case.

We have the same problem (we're on 6.1.384) , and we run the below query as an "OracleDB" job. It returns an integer, and rather than look at the exit code, we set up an output scan so that it simply fails if it returns anything other than '0'. That's when the events to trigger an alert as well as an email get fired. Then another event to mark itself as 'Completed Normally' so it can run again on its 20-minute cycle. It works well. We allow for 10 minutes in 'Launched' status, however, because we have seen some jobs go 'Active' after 5 minutes in launch status....but generally if it hasn't gone 'Active' after 10 minutes, it never will without manual intervention.

Select count(jobrun_id)

From tidal.jobrun

Where jobrun_status=50

And (sysdate - jobrun_stachgtm)*24*60 > 10

We have a custom view that shows all jobs in 'Launched' status, for the operator to refer to for details. We also run the below query as part of a "status report", basically a group of queries, every hour around the clock. This is to help the scheduling team track the problem in more detail:

Select j.jobmst_name "JOB",r.jobrun_id "JOBID",

to_char(r.jobrun_stachgtm,'DD-MON-YYYY HH24:MI') "LAUNCHED AT" ,

n.nodmst_name "SERVER",

to_char(SYSDATE, 'DD-MON-YYYY HH24:MI') "CURRENT TIME",

(sysdate - r.jobrun_stachgtm)*24*60 "MINS STUCK"

From tidal.jobrun r

join tidal.jobmst j on j.jobmst_id=r.jobmst_id

join tidal.nodmst n on r.nodmst_id=n.nodmst_id

Where r.jobrun_status=50

And (sysdate - jobrun_stachgtm)*24*60 > 10