Failure catching CALL_DISCONNECTED state, hitting max steps

drotheahit · ‎01-29-2018

I've been trying to catch CALL_DISCONNECTED states in my script and terminate via On Exception, like this:

On Exception (Contact Inactive Exception) Goto End

Yet I occasionally get this error:

Default Script Error Report

Trouble Code: Connected

Trigger Info:

Aborting Reason: com.cisco.wfframework.obj.WFMaxExecutedStepsExceededException: No. of executed steps: 1500 Application ID: 35 Application Name: (deleted)
Contact: JTAPICallContact[id=919,type=Cisco JTAPI Call,implId=2411886/7,active=false,state=CALL_DISCONNECTED,inbound=true,handled=true,locale=en_US,aborting=false,app=App[name=(deleted),type=Cisco Script Application...

It seemed to me that ContactInactiveException wasn't catching the problem, so I tried adding the IllegalContactStateException as well, but I still get the error. That said, it's not very often (maybe once a day), but I want to eliminate this. And it seems the two on exception statements should be catching it. Here's the code:
UCCX CALL_DISCONNECTED.JPG

I've been using a default script to report errors back to me using the sSysTroubleCode variable. Obviously in this section it can loop infinitely if it keeps failing to connect, but that shouldn't happen due to the on exception statements. I've also received error reports from sections of my script where there isn't any potential for a loop.

I also want to note this: On rare occasion in the past several years when doing a reactive debug, I've noticed the script doesn't terminate as it should when I hang up. But rather, it cycles back and forth between the two steps it was nearest when the call dropped. So wondering if this is some kind of bug or if I'm doing something wrong.

Mark Swanson · ‎01-29-2018

Please post your script.

Mark Swanson · ‎01-29-2018

Just FYI...

The Contact Inactive Exception is just that... the caller likely abandoned the call but this exception could be potentially caused by other factors as well (i.e. the user's phone crashed, network issues, etc.).

The Illegal Contact State Exception is talking about the caller (contact) again. With this exception, the contact places an inbound call into the application - triggering the script. The contact's state, or status is 'active'. At some point, the contact might become 'inactive'... if so, there's certain steps that wouldn't be able to handle inactive contacts. You shouldn't have to worry about this.

The exceptions allow you to customize handling of these exceptions, for example, let's say your script throws an Document Exception. You would add an 'On Exception' step and then, set the Doc variable again - perhaps a backup file. Then, you would use another GoTo step to loop the call back through the script.

The Max Executed Steps Exceeded Exception was caused by your script exceeding the maximum number of steps defined under the UCCX Admin > System > System Parameters > Application Parameters > Max Number of Executed Steps. The default value is 1,000 but you likely change this value to 1,500.

You're probably right, your script is hitting the FAILED branch of the Connect step and then, endlessly looping through your script... until it reaches 1,500 steps.

drotheahit · ‎01-29-2018

Ok then it sounds like my initial usage of just the ContactInactiveException should be handling the situation, but obviously it's not. Otherwise it wouldn't loop through the failed branch.

I've attached the full script and it's subflow. It's actually a zip file, so please just rename the file ext.

Mark Swanson · ‎01-29-2018

No, not exactly. The failed outcome of the Connect step pertains to the connection status between the caller and agent. Basically, the call failed to reach an agent. This failure can be caused by a number of reasons but more often than not, the call failed to connect because;

1. the agent's phone is exceeding the ring timeout value (10 seconds).

Do you guys place agents into a 'Not Ready' or 'Ready' status after RNA events? Under service parameters.

2. the agents might be using unsupported phone features, like iDivert or DND which causes the Connect step to fail.

An example of an ContactInactiveException would be abandoned call. Run the Reactive Debug and after you hit this exception, abandoned the call and see what happens. It's going to proceed to the End label... which is basically the default behavior. If you wanted to "test" this exception, then try this. Near the end of your script... add a Label for InactiveExcpt. Point the ContactInactiveException to this Label. Then, Set the sSysTroubleCode variable to whatever you want. This could be used for reporting purposes or perhaps, you want to track abandoned calls via emails - whatever.

Check the release notes and make sure you're not using or doing anything that's NOT supported.

Mark Swanson · ‎01-29-2018

Ahhh where is your End label anyways? I don't see it.

EDIT: Nevermind... I see you're using the embedded labels.

drotheahit · ‎01-29-2018

Very bottom of the script on the End step itself.

Mark Swanson · ‎01-29-2018

Ahhh I see. Your script contains the Select Resource step as well as the Connect step. You only use the Connect step if the Select Resource step doesn't connect your call. In your case, the Select Resource step is Connecting calls and using the 20sec Timeout value. The outcomes would be Connected or Queued, that's it. There's no need for the Connect step unless you want to pass variables to Cisco Finesse. Remove the Connect step and try it again.

drotheahit · ‎01-29-2018

Ok thanks. Going to remove the connect step and see how that goes.

But want to say this: regardless of how the initial failure happened, at some point the call state must have became disconnected before the script hit max (otherwise it would say call status is connected). Meaning that the exception should have still kicked it to the end of the script, and the max should have never been reached in the first place. At least it seems that way to me.

I also have an error report stating the script hit the max with the sSysTroubleCode set to "Menu to PR" (option 1, main menu). I can't see any possibility for a call geting stuck in that area and hit the max. There are no loops with the only branches out as AfterHours, XMLClosed and QueueSelectResource. If you follow the script, both AfterHours and QueueSelectResource immediately set sSysTroubleCode to something else. XMLClosed doesn't but quickly jumps to AfterHours after two steps. Does this make sense?

Mark Swanson · ‎01-30-2018

To answer your first question;

"At some point the call state must have became disconnected before the script hit max (otherwise it would say call status is connected). Meaning that the exception should have still kicked it to the end of the script, and the max should have never been reached in the first place".

Not exactly. Your original script used both of these steps; Select Resource and Connected. Under the Connected branch of the Select Resource step, you added the Connected step. The problem was... the Resource Step has the ability to 'Connect' calls. If you click Yes, then you wouldn't insert the Connect step. If you click No, then you would insert the Connect step.

Important Note: Changing this value (i.e. Connect = Yes / No) under the Select Resource step is going to change the branches of this step. If the value is Yes, the branches of the Select Resource step will be 'Connected' and 'Queued'. If the value is No, the branches of the Select Resource step will be 'Selected' and 'Queued'.

In your case, you connected the call via Select Resource (Connect = Yes) step. The call takes the 'Connected' branch when the agent answers the call. At this point, the call is connected - the script should be over and done. You can't Connect a call that's already connected. The Connect step is going to fail, hence, your problem.

If you want to use the Connect step, you can. Modify the Select Resource step by changing the Connect value to No. Under the 'Selected' branch, you would insert the Connect step. I would suggest increasing the timeout value from 10 seconds to 20 seconds.

Mark Swanson · ‎01-30-2018

To answer your second question...

I don't believe the Menu, XML, Set, etc. steps you created are causing problems. Option #1 is hitting the same Select Resource and Connect steps. Calls are connected and handled by agents, right? That's because of the Select Resource (Connect = Yes) step. The Connect step under the 'Connected' branch of the Select Resource step is most likely the problem child.

drotheahit · ‎01-31-2018

Hi Mark - thanks. As I said, I modified my script and removed the Connect step. It now looks like this:

Connected

Set sSysTroubleCode = "Connected"

Set Session Info (session)

Set Contact Info (--Triggering Contact--, handled)

End

Queued

etc. (nothing below changed)

That change was on Monday eve (Jan. 29). Since then I've received five errors hitting max steps:
Note: I did add a new variable sCallStart which I'm passing to the default script because I wanted to track duration.

#1:

Call start: January 30, 2018 11:48:32 AM

Script End (by Error Report): January 30, 2018 11:50:52 AM

sSysTroubleCode = "Queued"

Trigger Info:

Aborting Reason: com.cisco.wfframework.obj.WFMaxExecutedStepsExceededException: No. of executed steps: 1500 Application ID: 33 Application Name: (deleted)

Contact: JTAPICallContact[id=5649,type=Cisco JTAPI Call,implId=2953873/2,active=false,state=CALL_DISCONNECTED,inbound=true,handled=true,locale=en_US,aborting=false

#2:

Call start: January 30, 2018 2:49:12 PM

Script End (by Error Report): January 30, 2018 2:50:39 PM

sSysTroubleCode = "Leave Message"

Trigger Info/Aborting Reason same as #1.

#3:

Call start: January 30, 2018 3:39:01 PM

Script End (by Error Report): January 30, 2018 3:40:02 PM

sSysTroubleCode = "Queued"

Trigger Info/Aborting Reason same as #1.

#4:

Call start: January 31, 2018 11:50:01 AM

Script End (by Error Report): January 31, 2018 11:58:14 AM

sSysTroubleCode = "Queued"

Trigger Info/Aborting Reason same as #1.

#5:

Call start: January 31, 2018 4:06:51 PM

Script End (by Error Report): ): January 31, 2018 4:08:30 PM

sSysTroubleCode = "Connected"

Trigger Info/Aborting Reason same as #1.

As you can see the duration for these are generally short. Also, my queue wait time (iWaitTime) is 90 seconds. Do you see any way of this occuring in my script?

Mark Swanson · ‎02-01-2018

Move the Set Enterprise Call Info step and Set Session step just before the Select Resource step or Connect step... perhaps again near the end of your script. Then, you could remove most of the duplicate Set Session steps and you would use the Set steps to declare such and such values.
You might want to consider using an Switch statement rather than a series of IF statements, slightly fewer steps executed. Using 3 or 4 steps compared to 1 or 2 doesn't really make a difference, other than a slightly more efficient scripting technique. This isn't the source of your problem; exceeding 1500 steps.
You can remove the Set Contact Info "Marked" as Handled step. This step isn't needed. As soon as the call reaches the Connected branch of the Select Resource step, the system identifies the call as 'handled'. Again, this is not related to your problem.
What's the purpose of the IF Statement within the queue? Did you set the value of the 'iForcetoVMafterMins' from the Application?
If FALSE, you dequeue calls and attempt to "marked" them as handled. You can't do that. If you dequeue calls, they will be reported as 'Presented' and 'Dequeued' but 'Not Handled'. You can "marked" calls as handled in which calls will be 'Presented' and 'Handled' but they wouldn't be reported as 'Dequeued'.
Also, rather than introducing another End step into your script... remove the End step under the Connected branch of the Select Resource step. Next, add a GoTo label pointing to the End of your script. This helps to prevent unexpected loops and exceptions.

How many steps do you have in the 3x subflows?

How many calls do you guys handled a day vs how often do you notice this exception? Do you notice any patterns in the logs or reports, for example, these exceptions only occur when 1) agent xyz handles the call, or 2) when the calling number is 555-1234, or 3) calls from a certain area code. If so, perhaps there's other factors you should consider... like, problems with MTP or codec negotiations.

Mark Swanson · ‎02-01-2018

Ok, looking at your PA-SiteMain script again. You have three subflows; PA-IsOpen, PA-IsOpen and HolidayCheckPA. I guess you have duplicate subflows for PA-IsOpen because you want the business hours for RS and PR. I reviewed your PA-IsOpen script. The way you're identifying the business hours is likely causing an excessive number of steps. The subflow isn't exceeding 1500 steps but who knows, maybe it's consuming 1000 steps.

My advice is... you should have one PA-IsOpen subflow for both RS and PR, but you should use another string variable like sTeamName. You would pass this string variable to the subflow via input mappings. The subflow would contain a Switch statement, using the sTeamName as the switch value. You would insert the Day of Week and Time of Day steps under each branch of the Switch statement, then you would determine if the helpdesk is opened or closed for each team. And like before, you would pass this variable back to PA-SiteMain via output mappings.

This approach should drastically reduce the number of executed steps performed by your script(s). But, before you do anything... what do you mean by this?

Application ID: 33 Application Name: (deleted)

Did you purposely delete the Application Name? Or, is this Application using the PA-SiteMain script?

Also, check out this link about tracing the executed steps;

https://supportforums.cisco.com/t5/collaboration-voice-and-video/uccx-viewing-executed-script-steps-via-cli/ta-p/3162231

I never had the need to use it myself. Good luck.

Mark Swanson · ‎02-08-2018

Just a follow up. Were you able to identify and resolve this problem yet?