cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3694
Views
56
Helpful
12
Replies

CSCvz27796 - MRA Registration Flaps

Adam Pawlowski
VIP Alumni
VIP Alumni
I spent a few hours on this one myself the other day in our lab and wanted to make this post in case it helps someone else in this situation.
 
Post upgrade of our UCM cluster to 14.0SU1, Jabber would register but then not operate, same with IP Phones over MRA. You could occasionally make a call that would last ~30 seconds, before being disconnected and the endpoint not working.
 
Logging from the UCM shows endpoint unregistered, and lists a reason code of 6:
 
ccm02test:Jan 27 11:24:00 ccm02test Jan 27 2022 16:24:00.191 UTC :  %UC_CALLMANAGER-3-EndPointUnregistered: %[DeviceName=SEP2834A2FFFFA1][IPAddress=1.1.2.2][Protocol=SIP][DeviceType=685][Description=Adam Pawlowski 8861][Reason=6][IPAddrAttributes=0][LastSignalReceived=SIPConnControlInd][MRAStatus=0][AppID=Cisco CallManager][ClusterID=CCM01TEST-Cluster][NodeID=ccm02test]: An endpoint has unregistered
ccm02test.log:Jan 27 11:24:00 ccm02test : : 762: ccm02test Jan 27 2022 16:24:00.189 UTC :   %UC_CALLMANAGER-3-EndPointUnregistered: %[DeviceName=SEP2834A28FFFA1][IPAddress=1.1.2.2][Protocol=SIP][DeviceType=685][Description=Adam Pawlowski 8861][Reason=6][IPAddrAttributes=0][LastSignalReceived=SIPConnControlInd][MRAStatus=0][AppID=Cisco CallManager][ClusterID=CCM01TEST-Cluster][NodeID=ccm02test]: An endpoint has unregistered
The UCM shows a socket error, regarding the connection being closed, before it tears down the device registration:
UCM_Trace.png
 
If you enable developer logging in the Expressway, you can see the socket being closed:
tvcs: UTCTime="2022-01-27 16:11:55,757" Module="developer.sip.transport" Level="DEBUG" CodeLocation="ppcmains/sip/siptrnsp/SipSockMap.cpp(384)" Method="SipSockMap::closeSocketAndFreeEntry" Thread="0x7f5e10fbb640": LocalId="459358807" LocalAddr="['IPv4''TCP''1.1.2.2:26979']" RemoteAddr="['IPv4''TCP''1.1.2.60:5060']" Type="SIP_SOCKTYPE_TCP_OUTG" Detail="Closing Socket" Reason="Manually forced disconnect"
In a wireshark trace, this is a FIN coming from the Expressway towards the UCM.
 
We can see in the Expressway logging then just before this that it's due to STUN reply timeout:
 
2022-01-27T10:07:38.996-05:00 tvcs: UTCTime="2022-01-27 15:07:38,995" Module="developer.sip.transport" Level="INFO" CodeLocation="ppcmains/sip/siptrnsp/siptrnspsfsm.cpp(4334)" Method="::SIPTRNSP_doStunTimeoutSocketClose" Thread="0x7f5e10fbb640": freeing cucm socket connection
 
 

Disabling STUN keepalive in the Expressway C does not fix this. The workaround in the bug does. You can change the name of the Expressway C entry , under Devices -> Expressway -C , in your UCM, to FQDN instead of hostname. This is assuming you let it auto populate.

 

The phones and Jabber have their socket open to the Expressway - E so they don't reflect that they've lost registration. Attempting to place a call from them results in maybe it working briefly if it's currently registered, but otherwise a fast busy.

 

As others are moving to CSR 14 I figured I'd post this to save some trouble and hair pulling if this comes up in search.

Sorry for posting this in edit blocks but the community kept giving me blob conversion errors (?) and eating my post.

12 Replies 12

Thank you for your sharing this @Adam Pawlowski This is exactly what we saw when we where on CM14SU1 for a short period on our pre-production system and TAC never could find the reason for this, so in the end we rolled back to 12.5SU5. Will check the details on the bug note and try to upgrade once more. Again a big thank you, appreciate you taking the time to share this!



Response Signature


Follow up on this. We upgraded our pre-production system yesterday again to CM 14 and put the workaround in-place. With this we're not seeing the same issues we had back in late 2021 when we last tried this version on this system. Again a BIG thank you @Adam Pawlowski for taking the time to share this with the community.



Response Signature


Adam Pawlowski
VIP Alumni
VIP Alumni

Superstar @Anthony Holloway suggested I take another crack at posting with some follow up on the alarm code and reason. I have some cheat sheets and crib notes on various reason codes, but I often forget that I can go right to the source on these, like I did in this case.

 

Alarms like EndPointUnregistered have detail information available in the serviceability section of the UCM:

 

alarm_list.png

I'm guilty of forgetting this is here, and then searching "reason code" or something on the forums and hoping I get the right answer. In this case I didn't find much of anything for 6 and assumed it was related to the TCP connection closing after I saw what was happening.

Breaking out the telescope to read what it means from the search above:

reason_6.png

In this case I guess I can infer that 'ConnectivityError' generally includes socket errors, connection closing, or being closed. If the device had actually unregistered and sent an alarm itself, another number could possibly have been used here. But with this being MRA the devices don't actually know they've lost registration. Jabber also was logged with a reason of 6 in this scenario.

MichaelWells5
Level 1
Level 1

Excellent post. The fix is converting expressway C hostname to FQDN. 

Once the Cs were updated to FQDN in CUCM, no more disconnected calls from MRA phone, in this case an 8845.

Prior to change calls outbound from 8845 would connect, media established on both sides but roughly 10 seconds in, 8845 would drop call. Then fast busy on further outbound attempts. Call would still show connected on called device. 

Thanks again for this very helpful post Adam Pawlowski.

 

 

DubininSN
Level 1
Level 1

Hello, I have same issue with Cisco Jabber for PC and mobile phone client, but no problem with DX70 with MRA. Your fix solution cannot help me, no idea why. Expressway 15.0, CUCM 14 SU3. May be anybode knows how to fix this issue?

Djeten
Level 1
Level 1

I have the same issue on CP7821 and CP8851 phones. Phones register through MRA but then unregister after about 30 sec. The status messages of the phone show 'Reset-Restart'.

Changing the name of expressway C in CUCM does not resolve my problem...

If the phone is resetting then perhaps there's another operation it is trying to do, or the UCM Is pushing it off.
Can you look at the log from the UCM for the unregister alarm and see if it provides more information with the cause code?

Hi Adam,

thanks for your reply. I opened a TAC case for this. They had me remove the CUCM configuration on the expressway C and reconfigure it. After that, the phones stay registered.

We still have problems using Jabber over MRA though. On the expressway C, I see that the user is authorized, but the Jabber client goes back to the login screen after that.

Hopefully the Jabber client log can help point you to why it would do that, but the flow and places to check may be somewhat different between LDAP/Basic and oAuth, single cluster/multi cluster, etc.

So Refresh was not enough at EXP-C? OR didi you even do, i didn't 1st and learned that it need to be done.
EXP-C go to Configuration > Unified Communications > Unified CM servers and click Refresh servers.
Step 2 For Unity Connection, go to Configuration > Unified Communications > Unity Connection servers and click Refresh
servers (if exists). Seem to be easy to spot as EXP-C seem to show old SW version info until Saved or Refreshed.

That is a step whenever you upgrade the UC products on the other side of the Expressway. It doesn't seem to do much most of the time, but as you note refreshing may then have caused the Expressway to interact with this feature as it did not know to otherwise
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: