Re: 50002 - What is considered statistically normal versus high frequency of these errors?

jasondurie@moonjet.com · ‎09-06-2018

50002 errors can never be eliminated entirely, but how do you know you have done enough to eliminate them? What is normal vs excessive? We have developed a metric that can be shared across Cisco partners and customers that will help us all determine where we stand in the industry.

Run the query below against your UCCE instance and share the results. This query basically provides statistics and ultimately a ratio of 50002 errors per agent hour logged in (50002dedups_Hours_%).

select

convert(date, Date) Date,

count(1) Agents,

1.0*nullif(sum([50002_dedups]), 0) / nullif(sum(HoursLoggedIn), 0) [50002dedups_Hours_%],

sum(Logins) Logins,

sum(HoursLoggedIn) HoursLoggedIn,

sum([50002_dedups]) [50002_dedup],

sum([50002s]) [50002s]

from

(

select

*,

datediff(HH, LoginDate, LogoutDate) HoursLoggedIn -- hours logged in

from (

select

SkillTargetID,

convert(date, DateTime) Date,

max(case when Event=1 then 1 else 0 end) Logins, -- only count 1 login per agent per day

min(case when Event=1 then aed.DateTime else null end) LoginDate,

max(case when Event=2 then aed.DateTime else null end) LogoutDate,

sum(case when Event=3 and ReasonCode=50002 then 1 else 0 end) [50002s],

sum(case when Event=3 and ReasonCode=50002 and Duration<>900 and Duration<>1800 then 1 else 0 end) [50002_dedups] -- remove duplicate 50002 (15 min interval = 900 seconds, 30 min interval = 1800 seconds)

from Agent_Event_Detail aed

where aed.DateTime >= convert(date, getdate()-14) -- last 14 days

group by aed.SkillTargetID, convert(date, DateTime) -- group by agent, then day

) A

) B

group by convert(date, Date)

order by convert(date, Date)

I get asked about this error code everywhere I go, and when it is not correlated to a server-side or network event, it is often a rat hole that never seems to end. Use the results of this query to help determine if it is worth the effort or not to investigate causes of 50002 in your environment. The power of this data is when you share your results with the community. Thanks for participating.

samneil2000 · ‎09-28-2018

I am not a partner but a customer. However I have recently opened my second TAC case for this very issue. During the first, I engaged the network team who applied a QOS policy to prioritize the Finesse traffic between the call centers and the DC's. At the time we thought the problem was solved. Unfortunately one of our centers recently complained about this issue and I had to open Pandora's box again. I ran a similar query against our Aceyus DB event detail table and grabbed all of the 50002 events by call center, along with the durations. I totaled over six thousand hours of lost productivity due to the random not ready issue. I have engaged my Cisco team to hopefully work with the BU and TAC on this because I feel like it might be a major issue that is just being swept under the rug...

jasondurie@moonjet.com · ‎11-05-2018

I have updated my original post to make this thread more efficient.

jasondurie@moonjet.com · ‎11-05-2018

Results from three customers:

Customer ONE:

Date   Agents   50002dedups_Hours_%   Logins   HoursLoggedIn   50002_dedup   50002s
2018-10-22   219   0.097285067873   191   1326   129   1082
2018-10-23   213   0.080704328686   189   1363   110   923
2018-10-24   213   0.126013264554   189   1357   171   1127
2018-10-25   217   0.146994931209   192   1381   203   1291
2018-10-26   215   0.097451274362   187   1334   130   1244
2018-10-27   36   1.116279069767   7   43   48   1296
2018-10-28   30   22.000000000000   1   3   66   1276
2018-10-29   226   0.128401953942   204   1433   184   1207
2018-10-30   225   0.068868587491   198   1423   98   1106
2018-10-31   227   0.091718001368   203   1461   134   1002
2018-11-01   217   0.082327892122   207   1409   116   709
2018-11-02   206   0.091269841269   183   1260   115   1043
2018-11-03   41   0.812500000000   9   64   52   1381
2018-11-04   31   18.000000000000   3   3   54   1350
2018-11-05   28   NULL   0   NULL   5   162

Dates with data out of range are weekends were with unusual traffic or planned maintenance, so should be ignored. Looks like this Customer hovers around 8.5% 50002 errors per hour logged in. This query focuses only on the initial occurrence and filters out repeats at the interval.

Customer sample 2 (this customer uses WFM and adherence is important to them, hence lower numbers than Customer sample 1 above) around 4% 50002 per hour logged in.

Date   Agents   50002dedups_Hours_%   Logins   HoursLoggedIn   50002_dedup   50002s
2018-10-22   949   0.048904296149   898   8077   395   3544
2018-10-23   955   0.037229641886   942   8461   315   1872
2018-10-24   991   0.038420236248   975   8381   322   1946
2018-10-25   886   0.134738899748   878   7162   965   3620
2018-10-26   889   0.041318681318   863   6825   282   3549
2018-10-27   244   0.056478405315   190   1505   85   4105
2018-10-28   35   NULL   0   NULL   0   3360
2018-10-29   960   0.044881125667   928   8244   370   3149
2018-10-30   994   0.033484963356   981   7914   265   2470
2018-10-31   1003   0.041812333939   959   7151   299   4116
2018-11-01   974   0.046942400405   915   7882   370   5276
2018-11-02   905   0.063762909744   835   6681   426   5861
2018-11-03   307   0.058593750000   232   1792   105   6246
2018-11-04   58   NULL   0   NULL   0   5700
2018-11-05   58   NULL   0   NULL   0   57

Customer Sample 3: Customer appears around 6% per hour logged in.

Date   Agents   50002dedups_Hours_%   Logins   HoursLoggedIn   50002_dedup   50002s
2018-10-22   93   0.071216617210   93   674   48   67
2018-10-23   89   0.072164948453   89   679   49   86
2018-10-24   93   0.059405940594   92   707   42   103
2018-10-25   90   0.054054054054   89   666   36   77
2018-10-26   86   0.057722308892   85   641   37   66
2018-10-27   1   NULL   1   7   0   0
2018-10-29   93   0.041786743515   93   694   29   59
2018-10-30   92   0.059574468085   91   705   42   65
2018-10-31   87   0.029411764705   87   646   19   26
2018-11-01   84   0.062500000000   84   640   40   61
2018-11-02   84   0.053833605220   84   613   33   42
2018-11-03   2   NULL   2   14   0   0

samneil2000 · ‎11-05-2018

Here are the results from one of my systems:

Date	Agents	50002dedups_Hours_%	Logins	HoursLoggedIn	50002_dedup	50002s
10/22/2018	641	0.069073783	640	5096	352	416
10/23/2018	634	0.06993007	633	5005	350	439
10/24/2018	623	0.07115869	619	4764	339	438
10/25/2018	598	0.067743383	595	4458	302	382
10/26/2018	570	0.09506705	569	4176	397	463
10/27/2018	218	0.100123609	216	1618	162	180
10/28/2018	77	0.075645756	75	542	41	44
10/29/2018	627	0.05625879	627	4977	280	328
10/30/2018	621	0.055180871	619	4893	270	345
10/31/2018	613	0.065718947	611	4291	282	356
11/1/2018	583	0.051089759	580	4267	218	249
11/2/2018	547	0.047793221	545	4101	196	223
11/3/2018	211	0.038961039	210	1540	60	77
11/4/2018	75	0.033149171	74	543	18	18
11/5/2018	595	0.087250712	592	2808	245	262

Joe Gilbert · ‎11-12-2018

Thanks for posting this thread. This is an issue we've had for years from back in CAD 7.5 until now on Finesse 11.6. We've recently started testing a Finesse gadget that will change the user state back to the previous state if the not ready reason code is undefined. Has anyone tried this before?

Date Agents 50002dedups_Hours_% Logins HoursLoggedIn 50002_dedup 50002s
2018-10-29 538 0.052606060606 537 4125 217 405
2018-10-30 518 0.061866125760 505 3944 244 618
2018-10-31 494 0.073849545329 482 3629 268 700
2018-11-01 501 0.085490830636 483 3708 317 910
2018-11-02 514 0.079461457233 496 3788 301 868
2018-11-03 133 0.043478260869 115 690 30 797
2018-11-04 57 0.128318584070 38 226 29 811
2018-11-05 538 0.085023210359 522 4093 348 913
2018-11-06 505 0.056967103503 489 3739 213 947
2018-11-07 512 0.061830835117 508 3736 231 472
2018-11-08 509 0.046113306982 489 3795 175 723
2018-11-09 507 0.040619989310 490 3742 152 699
2018-11-10 121 0.066978193146 107 642 43 697
2018-11-11 61 0.007662835249 44 261 2 722
2018-11-12 512 0.083458646616 495 2660 222 503

jasondurie@moonjet.com · ‎11-12-2018

We've also been discussing this option.

Joe Gilbert · ‎11-13-2018

I have been testing the below in my test environment and had good results. I am forcing a CTI failure by restarting the Cisco Finesse Tomcat service on the Finesse server with this command. I am planning on testing this with live users on Thursday and will let you know how it goes.

utils service restart Cisco Finesse Tomcat

Let me know if you have any feedback on this.

/* need to define oldState and oldReason as global variables at top of gadget */

/**
 *  Handler for all User updates
 */
handleUserChange = function(userevent) {		
	//log current state and reason code
	clientLogs.log("handleUserChange, current state is: " + user.getState() + " reason is: " + user.getNotReadyReasonCodeId());	
	
	if (currentState == "NOT_READY") {
		var NotReadyCode = user.getNotReadyReasonCodeId();			
		
		if (NotReadyCode == "50002" || NotReadyCode == "undefined" || NotReadyCode == null) {
			//log new state we are trying to set
			clientLogs.log("setState, new state is: " + oldState + " reason is: " + oldReason);
			
			if (oldState == "NOT_READY") {
				var rc = { id: oldReason };
				user.setState(oldState, rc, {success: makeStateSuccess, error: makeStateError});
				function makeStateSuccess(){
					clientLogs.log("setState, success");
				}
				function makeStateError(){
					clientLogs.log("setState, error");
				}
			} else {
				user.setState("READY", null, {success: makeStateSuccess, error: makeStateError});
				function makeStateSuccess(){
					clientLogs.log("setState, success");
				}
				function makeStateError(){
					clientLogs.log("setState, error");
				}
			}
		}
	}
	
	//always gather the old state and old reason code
	oldState = user.getState();
	oldReason = user.getNotReadyReasonCodeId();
};

Joe Gilbert · ‎12-10-2018

Just an update from a few weeks of testing this gadget with live users. The amount of 50002 occurrences and time spent in 50002 have stayed around average.

david.macias · ‎11-13-2018

@Joe Gilbert I'm trying to understand why you want to do this? Is this just so the agent to not see an unnecessary error on their Finesse?

david

Blog

jasondurie@moonjet.com · ‎11-13-2018

I don't know about Joe, but our reason would be simple. WFM- Adherence.

david.macias · ‎11-13-2018

Hey Jason. Ok, so tell me more about this as I'm not following the use case. Group of agents are logged in and all of a sudden someone kicks their network segments and puts them all offline. It takes them about 30 minutes to figure out the issue and get back into CCE. Adherencewise, these agents were not online and receiving calls, so they would be out of compliance during that span.

david

Blog

jasondurie@moonjet.com · ‎11-13-2018

Your use case is valid. Use Case #2. Agent is momentarily disconnected from Finesse while their browser is backgrounded for whatever client side reason. They think they are still ready but they are not.

Joe Gilbert · ‎11-13-2018

jasondurie@moonjet.com wrote:
Your use case is valid. Use Case #2. Agent is momentarily disconnected from Finesse while their browser is backgrounded for whatever client side reason. They think they are still ready but they are not.

This would be our use case as well. The end user is in the ready state in Finesse but working on something else in a different system. There is no pop up or notification when they get put into Not Ready - CTI Failure so they may sit in this state for a long time before changing back to ready.

The goal of the gadget I posted above is to store the user's state in a global variable. When the user's state changes to CTI Failure (not ready with undefined code), change the user back to the previous state.

Amith Kumar · ‎03-17-2019

@Joe Gilbert wrote:

jasondurie@moonjet.com wrote:
Your use case is valid. Use Case #2. Agent is momentarily disconnected from Finesse while their browser is backgrounded for whatever client side reason. They think they are still ready but they are not.

This would be our use case as well. The end user is in the ready state in Finesse but working on something else in a different system. There is no pop up or notification when they get put into Not Ready - CTI Failure so they may sit in this state for a long time before changing back to ready.

The goal of the gadget I posted above is to store the user's state in a global variable. When the user's state changes to CTI Failure (not ready with undefined code), change the user back to the previous state.

Joe, it would really help if you could please provide a snapshot of the updated gadget and code. We have a similar use case where agent goes to not ready connection failure and we need to put them back to not ready or ready state. if you have code or script that automatically assigns this then it would be of great help. Kindly share

Best Regards