ā08-10-2009 07:35 AM - edited ā03-15-2019 07:17 PM
Guys, i have the following topology:
CUCM---[switch1]-------[switch2]-----IP phones
|
|
SRST router
I have on both switches data vlans and voice vlans.
Some of the phones at a random time failover to the srst router while the others are still registered to the CUCM.
There is a trunk port between both switches.
I checked the traces on the CUCM and i found a socket broken message when the ip phone is unregistered from the CUCM.
I plugged an ip phone directly on switch1 to check whether the same problem persists, i found that the tested phone works great.
Also i checked if there are some input/output errors on the interfaces on both switches, bt everything is great.
Did anyone face a similar problem before?
Please advice
Regards,
Moustapha
ā08-10-2009 10:37 AM
Hi Moustapha,
It seems more like a connectivity issue than anything else at your switch level.
Could you please verify if the connectivity between the switches themselves and the switch and phones are perfect.
You may want to do a sho interface and check the err after you clear the stats for atleast 24 hours and see if the err are increasing?
Please rate if it helps
Regards
wilson Samuel
ā08-10-2009 09:06 PM
Well,
i already checked that, and no errors are displayed on the interfaces.
The CUCM and SRST router's config are correct, and i am still investigating to check this issue.
I anyone has any other suggestions, please help
Thanks
ā08-11-2009 12:04 AM
Isolate the problem further.
1)Phone loads firm version. Are they the same for all the phones having the same phone model?
2)Switch OS version, are they the same?
3)Locate a phone that failover and note the switch port it is connected. Swap it with a phone that NEVER failed over and not the switch and the port it was connected to, observe.
ā05-19-2011 08:07 AM
I am also experiencing this issue? Have you been able to resolve it OP? Do you recall what it was?
Can anyone else provdie any input? Has anyone else ran into this and sucesfully resolve it?
I pulled CUCM traces and see the following for multiple phones during the reported SRST timeframe:
My topology is simplified comapred to OP:
SRST router
|
|
CUCM1&2---[core switch]-------rest of network, switches, IP phones
|
|
IP Phones
Unfortunetly I did not catch the show log fromt he router before it buffered out.
Any help would be greatly apprecaited!ā05-19-2011 08:35 AM
Sounds like a network problem somewhere between the IP phones and call manager. How often is this happening? Any network issues?
ā05-19-2011 08:50 AM
This is the first time I know of it has ever happened.
The system has been running stable for years.The router was rebooted back in December, has been running stable as well.
None of the remote sites experienced any issues at this time, so we know the connectivity between GW<----->Switch<----->CMs is well and stable.
Pinging the CM from router is good:
#ping 10.11.1.131
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.11.1.131, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms
#ping 10.11.1.132
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.11.1.132, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms
#trace 10.11.1.131
Type escape sequence to abort.
Tracing the route to 10.11.1.131
1 10.11.1.131 0 msec 0 msec 4 msec
#trace 10.11.1.132
Type escape sequence to abort.
Tracing the route to 10.11.1.132
1 10.11.1.132 0 msec 0 msec 0 msec
Pinging the router from CM is good:
admin:utils network ping 10.11.1.130
PING 10.11.1.130 (10.11.1.130) 56(84) bytes of data.
64 bytes from 10.11.1.130: icmp_seq=0 ttl=255 time=1.05 ms
64 bytes from 10.11.1.130: icmp_seq=1 ttl=255 time=0.625 ms
64 bytes from 10.11.1.130: icmp_seq=2 ttl=255 time=0.681 ms
64 bytes from 10.11.1.130: icmp_seq=3 ttl=255 time=0.641 ms
--- 10.11.1.130 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3098ms
rtt min/avg/max/mdev = 0.625/0.750/1.055/0.179 ms, pipe 2
admin:utils network traceroute 10.11.1.130
1 10.11.1.130 (10.11.1.130) 0.646 ms * 0.540 ms
admin:utils network ping 10.11.1.130
PING 10.11.1.130 (10.11.1.130) 56(84) bytes of data.
64 bytes from 10.11.1.130: icmp_seq=0 ttl=255 time=0.527 ms
64 bytes from 10.11.1.130: icmp_seq=1 ttl=255 time=0.575 ms
64 bytes from 10.11.1.130: icmp_seq=2 ttl=255 time=0.517 ms
64 bytes from 10.11.1.130: icmp_seq=3 ttl=255 time=0.632 ms
--- 10.11.1.130 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3013ms
rtt min/avg/max/mdev = 0.517/0.562/0.632/0.054 ms, pipe 2
admin:utils network traceroute 10.11.1.130
1 10.11.1.130 (10.11.1.130) 0.695 ms * 0.654 ms
ā05-19-2011 09:00 AM
The SDL traces show the following for the same IP address (just using one of the phones):
049538145
2011/05/16 10:05:11.183
001
AlarmErr
AlarmClass: CallManager
AlarmName: DeviceTransientConnection
AlarmSeverity: Error AlarmMessage:
AlarmDescription: Transient connection attempt.
AlarmParameters: ConnectingPort:13311
DeviceName:
IPAddress:10.11.3.120
Protocol:SCCP
DeviceType:255
Reason:6
AppID:Cisco CallManager
ClusterID:CALLMGR1-Cluster
NodeID:cm1
ā07-14-2011 04:49 PM
we are having this same "socket broken" issue. did you ever find a resolution?
ā07-14-2011 05:57 PM
Hi
We need to isolate the issue first...
Is it the phone or the Switch or the Call manager the Culprit?
Concept is simple: as soon as the phone looses coonectivity and does not receive a keepalive within 90 seconds it failovers either to the secondary call manager server or the SRST mode.
HTH
Tapan
ā07-14-2011 05:59 PM
adding to above check the Debug Display on the Web Page of the phone(accessible when you click on the IP address of the phone in CCM
)
ā07-15-2011 12:31 AM
Hi,
I think the cause was already isolated when you transferred the phones to a different switch. Did you find anything from switch 1?
ā07-15-2011 07:10 AM
Let my try to respond to the new posts:
The issue only occurred once, client requested reason-for-outage hence what began the investigation. We went through collecting packet captures, traces, logs, etc etc etc the whole nine yards. We did not find any issues internally on the network. The traces/logs pulled from the issue occurred all point to no connectivity to CallManager, but no reason why. We are not dropping packets and the response times are better than ideal. The issue has never occurred since.
What I find is extremely strange is:
- Only a few random phones went into SRST
- The CUCM and router are both connected via the core switch, hence if they dropped connectivity to cm but not the router seems odd as they can clearly communicate through said switch
- Other phones were working just fine at the time
- The phones were scattered on different ports and backplanes
- I highly doubt its hardware failing on specific ports, very unlikely to happen all the same time and work again after
- We had no network monitoring alerts regarding critical devices loosing connectivity (servers, router, switches)
- It was just a few random phones, internal network, that registered via SRST for a few minutes.....makes no sense....and unable to locate an answer.......Cisco TAC was unable to assist with a reason-for-outage as well, only suggestion was wait for it to happen again......which if it does I doubt we'd be able to collect any different or better data
- And just to play devil's advocate: if it smells like a bug, walks like a bug, talks like a bug......
So issue has never occured again, but we are still in the dark as to why it did happen.....sorry I dont have a better answer for all
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide