Recently when testing failover of SIP dial peers on an IOS gateway when a target is down, I found that the caller ID name changed to "pending" on the second dial peer. Maxing out the buffer timer for invites had no effect (also tried knocking down invite retries to 0 as well to reduce any cumulative time on retrying the first match dial peer). The call is sourcing from a PRI on the same gateway. I ran some debugs which confirmed it was an issue with the gateway, as the invite to the second dial peer was changed to:
instead of the proper CID name that came in on the PRI which was properly sent in the attempted invite to the first dial peer target.
After opening a TAC case, the engineer came back with the information below, and then followed up saying that the TCL script is unavailable, that there is no workaround, and since this is a severity 6 bug that's been open since 2006, I can't expect them to fix it. He also mentioned that it appears to be specific to calls sourced from a POTS dial peer.
What gives? Why don't they consider this a concern and fix it? Does nobody else have a PRI terminating on a voice gateway with SIP dial peers in a hunt configuration for failover?
After doing research on the issue, a bug is already filed for the issue which is as of now not a supported configuration :
Bug ID : CSCsd93147
2 dial-peers to SIP trunk to 2 CUCM's.
if local CUCM is switched off and dial-peer hunting occurs to send the call to the next CUCM, the gateway sends "RemotePartyID' as pending.
If dial-peer to local CUCM is itself shut down, then gateway forwards right calling party name to remote CUCM.
You can open the bug in the following link :
The bug is currently in a held state and the workaround is for the DE to provide a Tcl script that can be added to the IOS
Need to create a custom TCL script that extract the calling name (Not something TAC can help with, sorry)
The issue is that within the IOS the gateway is not buffering the calling name information from the facility message that comes in. What this means is that the information is handled correctly if the call matches the correct first dial-peer being hit. However if that fails then the gateway does not maintain the information to be used if a secondary dial-peer it hit.
If there is only one call manager in the picture, then "buffer limit" command can be used. However, if there are multiple call manager servers reachable by different dial-peers, a tcl script developed by the DE.
The bug was opened in 2006, however it is severity 6 bug.
Hence please contact your Accounts Manager to push the DE for the tcl script that has to be copied to the router's flash to resolve the issue.
There is no workaround for this issue that can provided as of now.
Solved! Go to Solution.
Thanks George - excellent workaround! When I was testing, I noticed if you shut down the dial peer it would work correctly (for obvious reasons). I had CUCM SIP trunks setup to monitor the gateway with pings, but not vice-versa. The only caveats would be a slight delay for it to start working on failover (I lowered the retries since CUCM and the gateway are on the same switch) and extra SIP messages to weed through when debugging (left the interval timers the same because of this), but not big deal at all for a working solution.
Thanks again for the super idea!