I came across some unexpected behavior yesterday I was wondering if someone could help shed some light on the problem. In my DNS I created this record: _sip._upd.platautofinance.com which when I do a query using MX Toolbox it is correctly reporting the following:
On the SPA504 I have programmed the SRV records into the system using the following:
So here is the issue, the phones will register correctly to the primary server but after a length of time ALL the phones will abandon ship and jump to the secondary server and stay there. I have to manually shut the secondary down, reboot the endpoints at which point they will re-register with the primary. So I think I need to understand the mechanism a bit better. Here is what I can think of:
The phones are not using the primary DNS service record providers DNS servers, we are using OpenDNS instead of Network Solutions <-- Unlikely because the SPA would not be able to resolve the SIP SRV records / servers at all with a broken DNS resolution. I can tell by dropping from the primary to the secondary that resolution is taking place on the phone because they are jumping between the right ones.
My understanding is the proxy fallback interval should have the phone register (and they should be checking periodically if the primary is available again and if so, re-register the to the lowest-weighted server) but that's not happening without a manual reboot. Once on the secondary they want to stay there.
My firewall for both the primary and secondary servers do not answer ICMP.
So the question is: why do the SPA phones think the primary is offline? What is the mechanism or criteria the SPA uses to determine a server is offline and then rolling to the secondary? Why do they want to stick around the secondary which is using only a best effort connection when we have a bonded T on the primary?
Thanks as always for your help and input group! Appreciate your time.
We've recently invested some time to configure SIP phones on our network to support SRV records with automatic failover and fallback and have also come across this problem.
This works great with Polycom and Snom, but the Cisco SPA range doesn't seem to support SRV with automatic fallback.
If the primary SIP server is down, the Cisco will failover to the secondary as required with-in 5 minutes, but once the primary SIP server is back online the Cisco never falls back unless we reboot the phone.
Model: Cisco SPA 504G
Is this a bug, or do the Cisco SPA range simply not support SRV fallback?
The problem on the Cisco SPA 504G has developed in our test environment, this morning the phone was frozen and not responding which required a reboot.
We've reviewed our server logs after a few days of testing, the Cisco phone has been sending registration requests to both the primary and secondary server but the active registration always appears to be on the primary when we check. Unless the phone has failed over to the secondary, in which case the active registration stays on the secondary.
We're expecting the Cisco phone to only send registration requests to the primary, unless the primary is offline then it should send requests to the secondary until the primary is online again, but our experience is very different.
We're using the following SRV records:
Points to: SIP01.server.co.uk
Points to: SIP02.server.co.uk
Points to: SIP01.server.co.uk
Points to: SIP02.server.co.uk
These are the SRV specific configured parameters we use on the Cisco phone:
We don't believe Cisco's implementation of SRV records is correct, is there something we're missing? Has anyone here successfully implemented SRV records for Cisco SPA phones?
We're expecting the Cisco phone to only send registration requests to the primary, unless the primary is offline then it should send requests to the secondary until the primary is online again
Have you Dual Registration set to no ?
Thanks for coming back to me, we currently have Dual Registration set to no.
We also have the following additional settings (if this helps us to diagnose the issue):
Auto Register When Failover set to no
Proxy Redundancy Method set to Normal
Register Expires set to 60
Register set to Yes
Well, if you have Dual Registration set to no then phone should not register to both proxies. If it do it, you hit firmware bug. Call Cisco support - no one else can help you with firmware.
On the other side, it's sounds suspicious to me that two registratin to different proxies harm. They should not. What's the problem with two registrations in your's particular case ?
We were seeing register packets hitting both servers, but the phone was only setting up the registration with one of the two servers. The two SIP servers are independent of one another, but hold the same configuration.
We haven't seen registrations hitting both servers since rebooting the phone, so we need to do more testing.
At the moment we suspect there may be some bugs in the firmware.
We were seeing register packets hitting both servers, but the phone was only setting up the registration with one of the two servers.
Still unclean to me. If you see REGISTERs from phone to both servers, then phone is attempting to set up registration to both servers. At the same time you claim the phone is setting registration to one server of them. It confuse to me.
We've been reviewing this in more detail and believe the Cisco was registering to the failover server because this was at the point when the phone was stuck in failover mode, we don't believe the phone was registered with the primary server at this time, so there would have only been one registration. We were only able to resolve this by blocking the public IP of the Cisco phone to the failover server for 10 minutes, at which point the phone re-registered with the primary.
We're repeating the test to ensure our results are consistent, slightly concerned that the Cisco was frozen the other morning which required a reboot but this could be unrelated.
Sorry for the confusion. It would still be useful if the Cisco phone could fallback to the primary server automatically.
Everything clear now. It's the well known issue. I assumed you are describing a new issue.
Cisco never return from fallback server to the primary one unless fallback server become unreachable. Yes, the SRV implementation is broken.
Did you tried to set Dual Registration set to yes ?
I assume the phone will try to register to both servers. As long as it will not confuse them (it should not) it may cause you will be registered to the primary server whenever reachable. No reboot or fail over server blocking necessary. I never tried it myself, so my assumptions may be wrong.
Even if it will work, it may not solve the outgoing calls - I'm unsure where the INVITE will be sent in such particular case.
Unfortunately Dual Registration set to yes has no effect.
The phone still only registers with one server and doesn't automatically fallback. Dual Registration might only be active when using an Alternative Proxy.
Thanks for your input, we'll work around this problem by blocking the public IP to force the phone to fallback when needed.