Showing results for 
Search instead for 
Did you mean: 

Carol Kelpin

Jobs hanging in "Launched" status on some agents

I have a ticket in with Cisco but no response yet so thought I'd check here.

I have Tidal 6.0.3 which has been relatively stable for 2 months now. I have some windows servers with agents running several months, and now jobs hange in "launched" status on these.

When this has happened in the past a restart of the agent, or reboot of the agent host has resolved and jobs could run again, this time its not working. event restarted all Tidal services to see if it cleared up anything with same results.

No changes/updates to the servers, has anyone had this before? Can someone point out a resolution to look at? My Scheduler group are starting to hound me on getting these agents going again.

Marc Clasby

Lauched is a state before job goes active

Job lifecycle from help:

Waits in the production schedule for its dependencies to be met.

Enters a queue and waits for an execution slot to become available.

Launches on its designated agent.

Starts execution successfully on its designated agent.

Completes normally.

    so the agent should have been assigned the job by the master and it is getting ready for execution (goes active)

    it should be getting a PID

    does the job status tab have a External ID? and does that ID/PID exist on the Tidal Agent as being active?

    (Tidal External ID= Server PID)

    Remote to Agent... open Task Manager ...Process Tab .. select menu item View ...Select columns..Choose PID (Process Identifier). Make sure the check box is checked for [x] Show processes from all users

    look for the External ID in the PID column..

    if it is there (probalby using no cpu/mem) then problem is likely on agent side and could be code itself..

    if it is not there (more likely) than the master was unable to commuicate with Agent and you can investigate the master logs (check agent communicaiton port, increase logging level,/high debug,get Cisco to assist, check network, etc)

    Hope this helps

    "Launched" status means the master sent a request to the agent, but the agent did not process it.  Since a reboot did not resolve the problem, I recommend you try deleted the agent working directory.

    • From the Tidal Client, disable the agent (Administration, Connections)
    • Log on to the Windows server
    • Stop the agent service
    • Go to the Tidal Agent directory - \Program Files\TIDAL\Agent
    • Delete the TIDAL_AGENT_1 directory.
    • Restart the agent service, this will recreate the TIDAL_AGENT_1 directory
    • From the Tidal Client, enable the agent

    Good luck.

    In 5.3, when all else failed (the suggestions above), I used sacmd to force the status of the jobs to completed normally or completed abnormally depending on what users need.

    Obviously, if underlying problem is not fixed (i.e. network issue, etc) the stuck in launch with continue.

    In 6.1 the stuck in launched I have seen seem to recover with a failover (or master bounce) like you mentioned.

    looks like a corrupted file event or bad file event (although these file events have been running for several months or longer). Spent 2 hours with Support, and altough only 14 file event jobs associated to this agent, one of them was the culprit to hange up response. disabled all file events, restarted the agent, jobs worked, enabled the file events, jobs still work.

    Thanks for updating us.  I always find it helpful to know different things to look for.  Did they say this was bug related or a weird anomaly?  We are on a different version but always something good to watch out for.

    Content for Community-Ad