Who Me Too'd this topic

webberr · ‎08-25-2014

This is an "answer" rather than a question, but I wanted to post it since it took a month or two to resolve this, even with the help of the TAC and I am hoping this post may benefit others who may hit this issue.

We had several VG350 gateways using SCCP to register to our 8.6.2 CallManager Cluster. We were finding that users would complain that at times they would go to use their analog phone and there would be no dial tone. By the time our tech reached the location (10-15 minutes later) the problem would be gone - everything would be working normally.

What we found was in our version of IOS (15.2(4)M4), the SCCP process would send the hello/keep alive packets to CallManager all at once, and all 144 replies would come back, all at once. In our rather fast network the 144 replies would hit so quickly it would overrun the input buffer on the port, resulting in drops. If the analog SCCP port experienced 3 consecutive drops (which some would), it would de-register from CallManager. This would result in no dial tone for the user. Usually the problem happened when there was also real (RTP) traffic in process on other ports, aggravating the port input queue issue. Typically in just a minute or two the SCCP keepalives would get processed correctly and the port would re-register.

Issues aggravating this issue were a high speed (LAN) network between the CallManager and the VG350 (a WAN connection likely would have slowed down the keepalives enough to have them processed correctly), the fact that we had all 144 ports configured (fewer ports in use would have helped the keepalive issue) and having ports with a good amount of activity (adding traffic to the input queues of the Gig ports, which is where the drops were being seen).

The solution was an easy one once the problem was fully understood: adding the "hold-queue 300 in" command to both interface GigabitEthernet0/0 and 0/1. This increased the queue from the default (70? 75? I think) to 300 and allowed the burst of keepalives to be held in queue until they could be processed.

Cisco stated they were investigating in a future IOS release of staggering the SCCP keepalives to avoid this issue, but the above solution worked perfectly at this point.

Rob.

Who Me Too'd this topic

VG350 SCCP Ports losing dial tone