UCCX upgraded to 12.5.1 - 100 agent OVA - now system very low on resources

keanej · ‎05-17-2021

Recently upgraded to version 12.5.1.11001-348, Two servers in the cluster, publisher and subscriber.

100 agent model - so 2 CPU / 10gb RAM / 1X 146 Disk
Typical Agent usage is .. 10 agents but occasionally we have 30-40 agents in use.
We have 120 agents configured we haven't hit 50 agents logged in an set to ready.
We have 9 applications / 9 trigger - with 4 in use - all seems like a low load to me.

Everything is crawling.. even when very few agents logged in.

Using the RTMT I'm getting Virtual memory at 85% - 95% .. when Idle !

CPU is around 30% but if you click on anything at all it spikes to 100%
Common partition usage is 72% to 78% / Swap partition usage is 85%

Alerts - loads with ' LOW Available Virtual Memory' -
Available virtual memory below configured threshold . Configured low threshold is 15 %

This only started happening when we upgraded to 12.5.1 - on 11.5.1 the hardware was fine.

What is going on here ? Is the 100 Agent OVA capable of running 12.5.1 at all ??

Advice / help appreciated !!

Recently updated all the Certificates so everything recently rebooted - purged database and utils dereplication is perfect.

Thanks

James

License -

Package: Cisco Unified CCX Enhanced
Total IVR Port(s): 100
Cisco Unified CCX Enhanced Seat(s): 50
High Availability : Enabled
Cisco Unified CCX Compliance Recording Seat(s): 50
Cisco Unified CCX Maximum Agents: 100

Jonas Fraga · ‎05-17-2021

Hello Keanej.

Have you reserved this 10GB vRAM on VMWare? Is there a chance of concurrent resources being in place (Ex: you have only 10GB vRAM free and 2 VMs requesting them).

Other possibility: Are you using Cloud Connect services? This increases the need from 10GB to 14GB vRAM

I've seen some memory and CPU spikes if you have some DB integration like JDBC or ODBC running on pooling sessions.

The both nodes are running on low Virtual Memory or just Publisher? If Publisher only, have you tried to swap agents to Subscriber and see how loads are affected?

keanej · ‎05-18-2021

Jo9nas - thanks a lot for the reply -

So just to answer those questions - only 4gb was 'reserved' or the 10gb

not using cloud service - simple setup - no ODBC connections both nodes running low virtual memory - the pub slightly worse.

I havent tried moving agents to the subscriber... the usage numbers below are with 2 agents in use !

So I powered down both servers - reserved 'all' the memory - (10gb now reserved).

Powered up and things do seem a bit better now..

for sub/pub - Virtual Memory - 80/85 - common partition 72/74 - swap 55/70

Crucially Im not getting system alerts about dangerously low memory !!!

So all in all - its a bit better - but these numbers are when idle .. in finesse - the page resets / becomes unreachable every 5 mins and your setting reset when it regains connection - it doesnt drop calls but agents are reporting being set to 'not ready' after the reset.

I get the feeling - Something isnt right here.. anyone else running the 100 agent UCCX profile in 12.5.1 ?

Are these numbers normal ??

Is an option to do a DR backup, build to a 400 agent profile - restore the config and contact TAC about restoring our 100 agent license.. .

Can I run the 400 agent hardware profile with 100 agent license, in reality only 50 agents would ever be ready - and this is extremely rare- but is this configuration allowed?

I just dont feel comfortable running hardware so hot when idle...

Jonas - what do you think- in a bit of a pickle

Thanks

James

keanej · ‎05-19-2021

Just a quick update -

Those alerts started again.. loads of these..

Available virtual memory below configured threshold . Configured low threshold is 15 % java#4 (5060 MB) uses most of the memory. The alert is generated on Wed May 19 18:09:34 IST 2021 on node XXXXXXX

bill.king1 · ‎05-19-2021

This doesn't look exactly like your scenario, but similar (12.5 system, memory usage alerts, etc.) and the last update for it was yesterday.

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvx74769

keanej · ‎06-01-2021

Looks like its a bug alright - working with TAC on it - apparently there will be a new ES release or something.

Not sure if thats the correct bug - i'll try to find out and get back.

James

keanej · ‎07-01-2021

Just an update on this

This wasnt solved, after many, many sessions with cisco TAC

if you are on the latest version of UCCX, DONT USE 100 ova agent profile - its not up to the task.

We have moved to the 400 agent profile and everything is working perfect - and yes, before you ask, you can run the 100 agent license on the 400 agent profile - the Cisco License Team were great in that regard

The rough procedure is..

Take a DR Backup of the system - take a copy of your current license / license MAC

Spin down the VMs

Spin up two new VMs with the 400 agent profile - upgrade the memory to 20gb (the OVA was wrong at 16gb for me)

Now install a fresh copy of UCCX 12.5.1 on the primary -

Restore DR Backup to the Primary

Build Secondary - restore backup on secondary.

Please note there is a lot of replication issues / syncing for Finesse and UCCX once both boxes are up - I did maybe 4 reboots of each box - but stick with it. Basically the Primary is king and keep pushing primary data onto sub/backup box.

Hope that helps

James

vijendra Saravanamuthu · ‎12-09-2021

I ran into the same bug

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvx74769

cisco said ES02 will fix this. i applied ES02 and also rebuilt the server on 400 agent ova template. issue continue to persists. i still have the case open. Tac now says this..

"

I talked to developers again and they are now telling me that actually they have recently receive other cases about current version still having issues with bug CSCvx74769.

Because of that they are reviewing code in ES02 again.

I will be following up with them about and will let you know what are their update on it."

good luck..

vijay

mohameds.yusuf · ‎01-27-2022

Hi, just an FYI that we have been hitting this issue for a few months. We applied ES03 to address some Log4j vulnerabilities fixes but that didn't resolve CSCvx74769. We are told by TAC this will be address around March 2022 in 12.5SU2 but they have a fix "ucos.CiscoJCEProvider5_3_7_ES.cop" if you are brave and want to test in production.

Daniel Bosch · ‎04-12-2023

We're running 12.5.1SU2ES04 / 100 agent OVA; 10GB RAM and appear to be hitting this bug too. Has anyone installed the ucos.CiscoJCEProvider5_3_7_ES.cop with positive results? Don't see it published so assume I'd need to submit TAC case to get it.

Eddgar Rojas · ‎10-21-2024

Well I have a cluster with the same issue of memory for very long but at last the bug CSCvx74769 now was finally resolved with the 12.5.1SU3, 5 days ago.