On Tidal 5.3, we have seen that jobs can be set to completed normally, after say an Abnormal completion. I want to audit this event.
One way i see this to be possible is via creating a Job Event
Step 1 : Define a Job in Actions -> Jobs
e.g. - FailedJob (This say will log the monitored job set by operator to completed normally into a database )
Step 2 : Define a "job event" via Events -> Job Events ( to call FailedJob defined above if the trigger is fulfilled )
I need to select a Trigger << This is where the problem is. There is no trigger like "Operator set the job to Completed Normally".
Is there any other way to do this?
We like to keep our operators informed by using the native Alerts in Tidal. Whenever any of our jobs fail, they create an alert which is a permanent log of failures. The added benefit of alerts is that they allow your operations to note who has been contacted and either they or the 2nd level support contact can set the alert to Acknowledged. Later, operations or BAs can set the alert to closed when the "job" customer acknowledges the resolution. Even if you don't incorporate this processes entirely, your operations can still make use of it as a quick way to view of the day's failures (or check back on any previous day even after jobs have been set Completed Normally).
If the operator is re-running a process, this also gives them a place to comment (restarted because database was down; restarted because DBAs had to add space then re-ran, etc). You can write a query against Tidal metadata for event activity if you just want a report of all jobs set Completed Normally (that previously failed). You're going to have jobs held, cancelled, and set completed normally too that will be a different situation. This is why I like the alert approach to give an operator (and next day reviewers) to check for what happened and why.
We also setup a reminder processes that emails our operations if any Alert has not been acknowledged in 30minutes.We use a Tidal repository metadata query that we run from ETL and send the appropriate notices because we notify at escalation intervals (30, 60, 90) for example.
Adjust your system-wide job event, je public Job Failed (you're name will vary but you typically have one system wide event to log all job failures that applies to all jobs), to include a new action called 'ad alert Job Error' defined as follows (Set Security Level to Error):
An error alert has been generated for:
Job ID: <JobID>
Job Status: <JobStatus>
Approximate original date/time of job error:
Time: <SysTime.h:mm AM/PM>
Appreciate you taking the time to respond to my question.
Your response was very helpful, but, we already have these alerts set up availing the native alerts.
The specific case that is not available is
a. Job starts and ends as "Completed Abnormally"
b. Operator is told that a back end fix has been completed to address the failure
c. Operator sets the Tidal job to "Completed Normally"
I am not aware of an alternative which allows me to audit this event. I do not have access to Tidal Metadata ( Its database ) .
Is there a way to do so given this limitation?
This is recorded in the Audit log... Do you need to catch real time or are you looking for an audit ?
if you search for "request" in text field these should pop by user name
Sent request to set job status for "<Event Name>" to Completed Normally.
Your request to set the status of job "<Event Name>" to "Completed Normally" was processed successfully.
we pull out all operational actions and alerts into CSV by queries. I put these to a protected area that the tidal administration team and a runtime user can reach. We use those for any auditing activities
We have a sql back end and I could give you those queries if you need them.