What's the expected failover time for a SIP phone to failover to the 3rd member of a CM group if the first 2 are unavailable? Is the "SIP Station KeepAlive Interval" the only thing that would impact this?
Thanks for the link. So for testing we simulated an outage; of the 1500 phones, only the 100 or so that are SCCP failed over in in under 5 minutes. It took somewhere between 15-20 minutes before all the remaining phones failed over to the 3rd node (all nodes are using the 7500 user OVA).
Do the same 100 and 1500 phones failover quickly and slowly in subsequent tests, respectively?
If yes, does factory resetting one of the phones that is slow to failover cause it to failover quickly?
If yes, those phones may not trust the configuration file they've been given due to an issue with the Initial Trust List.
If it's a random assortment of phones that are slow to failover each time, you'll probably have to grab packet captures from the phone and CUCM along with the SDL traces. Once you have those the question would be who's slow to respond: the phone or CUCM?
PS- You said "simulate" a failure. I have been tripped up in the past with phones not failing over quickly if the failure is unidirectional (i.e. I only blocked traffic with an ACL in one direction). I suggest ensuring traffic is falling in both directions so that both sides fully recognize that the TCP socket is closed and can move on.
We only had the chance to test once as our outage window expired before we were able to test multiple times.
So the simulation likely was flawed (I realize that now after the fact when i realize why Unity never failed either); Pub and Sub1 are in building A, Sub 2 in building B. There are a number of remote sites that have connections to both buildings via vpn's or other means. The connection from building A to the remotes was severed so they should only have been able to reach building B (building A to building B never lost connection).
What i did notice, which I thought was odd (and possibly speaks to your point about a simulation not being a true indicator) was if i looked in CUCM (after breaking the connection to site A for remote sites) and searched for a particular phone I would see it registered on Sub 2 briefly, then not registered and then a few minutes later it would be registered to Sub 2 again.