Solved: Understanding Intercluster Trunks better

shikamarunara · ‎07-27-2011

Hello,

I have an ICT going between cluster A (4 servers) and cluster B (7 servers). I've been monitoring the performance on both clusters on RTMT and have noticed a consistent spike in calls, accompanied by a spike in CPU utilization. This leads to sporadic error messages for some callers' phones saying, "High Traffic, try again later".

What's particularly odd is that the spike CPU is accross all subscribers on cluster A (not the publisher). The ICT uses subscriber 1 primarily. This has led me to wonder if the calls are causing the spike or if something else is to blame. If an ICT is configured for a certain CallManager Group and the first server can't take the call, does that call roll over to the subsequent servers in that group?

Steven Griffin · ‎07-27-2011

It would help to know the version of Call Manager you are using. It would also help to know if this is a gatekeeper or non-gatekeeper controlled trunk.

-Steven

Please help us make the communities better. Rate helpful posts!

View solution in original post

Joseph Martini · ‎07-28-2011

You may be having a call loop, however you would see the call hitting the first destination address on the trunk and on the other side the call should be processed by the first server in the trunks call manager group. Is the call manager service running on the publisher? An easy way to check for a call loop would be to look at the real time monitoring tool for calls attempted and see if the counter jumps when you see the problem. By the way, the high traffic try again later means you hit a "Code Yellow", you'll see an alarm for it in your application log.

View solution in original post

Steven Griffin · ‎07-27-2011

It would help to know the version of Call Manager you are using. It would also help to know if this is a gatekeeper or non-gatekeeper controlled trunk.

-Steven

Please help us make the communities better. Rate helpful posts!

shikamarunara · ‎07-27-2011

Hi Steven,

Thanks for replying. This is on CUCM 8.0 on non-gatekeeper controlled trunks.

Joseph Martini · ‎07-28-2011

You may be having a call loop, however you would see the call hitting the first destination address on the trunk and on the other side the call should be processed by the first server in the trunks call manager group. Is the call manager service running on the publisher? An easy way to check for a call loop would be to look at the real time monitoring tool for calls attempted and see if the counter jumps when you see the problem. By the way, the high traffic try again later means you hit a "Code Yellow", you'll see an alarm for it in your application log.

shikamarunara · ‎07-28-2011

My thoughts went to the call loop possibility as well, however I'm trying to confirm the mechanical aspect. I see all of the servers in cluster spike at exactly the same time. If an ICT on the first server in its CallManager Group (cmg) is unavailable, does the second server in the cmg for the ICT become engaged to process the call. That may explain why all of the servers are affected by the loop. My understanding, also, is that the "High Traffic" notification occurs when a call control server takes too long to process the call (over 20 msec.)

The CPU spikes do indeed occur at the same time as the spike in calling.

shikamarunara · ‎07-28-2011

I think I understnad the mechanics of the Intercluster Trunk now. From the SRND;

Using Run on All Active Unified Nodes with H.323 Non-Gatekeeper Intercluster Trunks

In this type of deployment, Run on all Active Unified CM Nodes is used by the H.323 non-gatekeeper

intercluster trunks in each cluster. When defining this type of trunk, you may define up to 16 remote

Unified CM servers in the destination cluster. (The number of remote servers that you need will depend

on the number of active Unified CM nodes in the destination cluster.) The trunk will automatically

load-balance calls across all defined remote destination Unified CM servers.

Only trouble is, this passage refers to the "Run on All Active Nodes" setting for ICTs found n CUCM 8.5. I'm runnning 8.0 where this setting does not exist (the SRND covers the entire 8.X line). Does this load-balancing occur on 8.0? If so, it explains why all servers are spiking at once and this is likely due to a call loop.

Can anyone tell me if there's a maximum number an ICT can handle?