Can someone help me here?
Our IP phones are getting resetting and restarting frequently. Details are given below, but its not affecting our active calls.
9:38:38a 14: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=UCM-closed-TCP
9:38:38a 18: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=Failback
9:40:10a 10: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=TCP-timeout
9:41:11a 14: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=UCM-closed-TCP
9:41:11a 18: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=Failback
10:09:49a 10: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=TCP-timeout
10:09:51a 23: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=Reset-Restart
10:28:00a 10: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=TCP-timeout
10:28:10a 23: Name=SEPECC882B0AD77 Load= SCCP45.9-0-3S Last=Reset-Restart
App Load ID jar45sccp.9-0-3TH1-22.sbn
Boot Load ID tnp65.8-3-1-21a.bin
CUCM Version 7.1.5
Thanks in advance
Looks like they are loosing connections to CUCM. Check on your heartbeats to/from the IP phone below to its primary CUCM server and backup. It seems to be bouncing between primary and backup for some reason.
- WAN connection issue
- busy network
- connection issue at CUCM
- mismatched port speed at the phone/switch
Did you guys ever find a solution to this? I am having this problem on a Gig LAN network from the office across the street. That's connected over fiber. I am pinging the devices constantly and there are no drops, but the user's 7942 phones received: "CM Fallback Service Operating" There are no network drops and all other applications work without any issues. Other phones on the LAN also seem to work fine.
We had similar issue with 7962 and 7965 phone, working perfectly in my office, but when moving to end user, the phone was continuously restarting from 5 to 30 seconds after registering. That office where end user was situated uses HP switch (on the other hand 78XX phones work there with no problem). I went through all possible options in configuration and debugging, when finally finding in logs some issues on phone with vlans (old vlan 4096, new vlan 4095).
The issue was resolved setting vlan tagging on the Cisco phone and HP switch, where default router was behind HP switch, namely Cisco.
Thanks for your solution Ales. I actually resolved our problem by factory resetting the phones and performing a reboot on the CUCM servers which had been up for nearly 2 years after gaining access to the OS. Has not re occurred since. Should not all our equipment is on Gig Cisco 3850 switches with Fiber connectivity, though the network does not appear to have been a problem.
Hopes this helps others.
this sounds very much like a TCP timeout issue normaly caused by some sort of stateful filtering done between the CUCM and the end user handset.
We had the same issue caused by Checkpoint firewalls. There is a known bug in SecureXL where it will impact tcp packets if you utilie a PPPOE link.
This is still an active bug on Giai 77.30.
Point being is that make sure you do not have any filtering that would impact or force different timeout values on your tcp packets
Please any one can help us out from this issue
If you are using third generation phone(7970, 79x1, 79x2, 79x5), then there is no fix for it.
This is what I got latest update.
There are a couple of things that need to be kept in mind:
1. Phones will unregister from the CUCM and register to the SRST GW
2. GW will tear down Q.931 backhauling from CUCM and will function as a standalone call agent.
The phones will lose either SCCP or TCP keepalives. TCP keepalives being missed generally trigger SRST much faster than SCCP keepalives being missed.
The phones will register to the GW even if Q.931 backhauling has not been torn down. Hence, there might be a brief period where the phones have registered to the GW but the PRI is still MGCP controlled. In such a case, calls will initially fail. After some time (and this is post the 30 second MGCP KA) the Q.931 backhauling will be torn down (isdn bind-l3 ccm-manager will be removed) and the PRI will function as if it has been configured on an H.323 GW. Here we need to understand that the phones showing "Registering" or "CM Fallback Service Operating" are not indicative of the GW going into SRST. The phones will go into SRST much faster than the Q.931 backhauling being torn down.
The behavior you are seeing is the mechanism for timing out a TCP connection and has nothing to do with the SCCP keepalive itself. Any time the phone sends a TCP packet to the server and does not receive a TCP Ack. The phone will retransmit the packet at decreasing intervals until the session is timed out (phone sends TCP RST) and at that point the phone will failover to the next CCM server or SRST reference.
The SCCP keepalives are sent at regular intervals, based on a value presented to the phone during registration (30 seconds by default). If the phone gets a TCP ack for the keepalive, but no SCCP keepaliveAck from the server then you can get into the situation where the phone unregisters due to keepalive timeout (after 2 or 3 such missed keepaliveAcks).
The former is a network problem, the latter is an application problem where, the network layer of the CCM server is acknowledging that the message was received but the CCM application is not responding.
You will note in your example when the phone registers with the SRST router the sccp Alarmmessage it sends will contain a string like "last=TCP Timeout" or similar.
The 3rd gen phones (7970, 79x1, 79x2, 79x5) are much more aggressive in timing out the TCP session than the 2nd gen phones. What took your 7960 26 seconds to unregister will take a 7965 about 8 seconds.
I had a juniper firewall once between a remote site and CUCM and the SCCP keep alives were being delayed. This caused issues with some phones of course. Not sure if this is something you have or something similar. You may also adjust the trigger in CUCM SCCP keepalives to a higher value. This may help as well.