11-25-2012 01:47 AM - edited 03-16-2019 02:22 PM
Hi,
I have upgraded CUCM from 4.2.3 to CUCM 7.1.5. The upgrade was a success. But after when the subscriber is also upgraded and its live, calls to 2 or 3 sites are not going. When we make the subscriber down, the calls are going with no problem. There are 2 nodes in the cluster. The subscriber replication status stays in 0.
Regards,
Fawaz
Solved! Go to Solution.
11-25-2012 01:56 AM
Hi Fawaz,
Good day! :-)
Sounds like you got yourself a dbreplication issue. RTMT Status of 0 is not healthy. Can you send me "utils dbreplication runtimestate" from the Publisher CLI please?
Also, please go through my blog and tech-talk video on dbreplication to help you get familiarized with the dbreplication commands:
Regards,
Harmit.
11-25-2012 01:56 AM
Hi Fawaz,
Good day! :-)
Sounds like you got yourself a dbreplication issue. RTMT Status of 0 is not healthy. Can you send me "utils dbreplication runtimestate" from the Publisher CLI please?
Also, please go through my blog and tech-talk video on dbreplication to help you get familiarized with the dbreplication commands:
Regards,
Harmit.
11-25-2012 02:13 AM
Hi Harmit,
You again. Great!! Thanks for the reply.
The result of the utils dbreplication runtimestate is given below.
admin:utils dbreplication runtimestate
DB and Replication Services: ALL RUNNING
Cluster Replication State: BROADCAST SYNC Started on 1 server(s) at: 2012-11-25- 15-07
Sync Progress: 0 tables sync'ed out of 427
Sync Errors: NO ERRORS
DB Version: ccm7_1_5_30000_1
Number of replicated tables: 427
Cluster Detailed View from PUB (2 Servers):
PING REPLICATION REPL. DBver& R EPL. REPLICATION SETUP
SERVER-NAME IP ADDRESS (msec) RPC? STATUS QUEUE TABLES L OOP? (RTMT) & details
----------- ------------ ------ ---- ----------- ----- -------- ---- -----------------
AUCCM01 10.11.10.235 0.048 Yes Connected 0 match Yes ( 3) PUB Setting Subs
AUCCM02 10.11.10.236 Failed No Active-Dropped 0 ? No ( ?) Setup in Progress
Thanks for the link as well. Now I have started to rebuild the subscriber server once again. Its been third time I am re-building the subscriber today!! There was some network connectivity issue in between. It was not able to connect to the gateway.
One more thing I would like to ask you is, once somewhere I have read that cisco cucm does not support network port teaming. Here we have done network port teaming on the CUCM subscriber server. But not on the Publisher. Please advise?
Regards,
Fawaz
11-25-2012 02:36 AM
Hi Fawaz,
Ok, if you still have dbreplication issues after rebuilding the Subscriber, let me know and we can look into it.
As for the NIC teaming, it is definitely supported.
NIC Teaming for Network Fault Tolerance
The NIC teaming feature allows a server to be connected to the Ethernet via two NICs and, therefore, two cables. NIC teaming prevents network downtime by transferring the workload from the failed port to the working port. NIC teaming cannot be used for load balancing or increasing the interface speed.
Hewlett-Packard (HP) and IBM server platforms with dual Ethernet network interface cards can support NIC teaming for Network Fault Tolerance.
Hewlett-Packard server platforms with dual Ethernet network interface cards can support NIC teaming for Network Fault Tolerance with Cisco Unified CM 5.0(1) or later releases.
IBM server platforms with dual Ethernet network interface cards can support NIC teaming for Network Fault Tolerance with Cisco Unified CM 6.1(2) and later releases.
This command enables and disables Network Fault Tolerance on the Media Convergence Server network interface card.
Command Syntax
failover {enable | disable}
Parameters
•enable enables Network Fault Tolerance.
•disable disables Network Fault Tolerance.
Requirements
Command privilege level: 1
Allowed during upgrade: No
The referenced guides:
SRND
http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/srnd/7x/callpros.html#wp1043624
CLI Ref
http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/cli_ref/7_1_2/cli_ref_712.html#wp40017
HTH.
Regards,
Harmit.
"NIC Teaming for Network Fault Tolerance
The NIC teaming feature allows a server to be connected to the Ethernet via two NICs and, therefore, two cables. NIC teaming prevents network downtime by transferring the workload from the failed port to the working port. NIC teaming cannot be used for load balancing or increasing the interface speed.
Hewlett-Packard server platforms with dual Ethernet network interface cards can support NIC teaming for Network Fault Tolerance with Cisco Unified CM 5.0(1) or later releases.
IBM server platforms with dual Ethernet network interface cards can support NIC teaming for Network Fault Tolerance with Cisco Unified CM 6.1(2) and later releases. "
And the command to do that as per the CUCM OS CLI Reference Guide is:
"set network failover
This command enables and disables Network Fault Tolerance on the Media Convergence Server network interface card.
Command Syntax
failover {enable | disable}
Parameters
enable enables Network Fault Tolerance.
disable disables Network Fault Tolerance.
Requirements
Command privilege level: 1
Allowed during upgrade: No "
The referenced guides:
SRND
http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/srnd/7x/callpros.html#wp1043624
CLI Ref
http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/cli_ref/7_1_2/cli_ref_712.html#wp40017
11-25-2012 03:47 AM
Hi Harmit,
Thanks for the reply.
I saw your video. Its very good one. Thanks for sharing that video.
Now I have rebuild the server. After the rebuild, it still showed a replication error. Sub showed the replication value of 0. So I have done a utils dbreplication clusterreset and on top of that I did a utils dbreplication reset all.
While running dbreplication cluseterreset, the replication value came to stable of value 2 for both nodes for a while and then sub became 2 and the pub became 4. I have restarted the publisher 1st and then the subscriber 2nd.
After both the restart, the replication status of pub is 2 and sub is 0. Now how do I move ahead?
Regards,
Fawaz.
11-25-2012 04:05 AM
Hi Fawaz,
Thanks! Glad to hear you liked it. As for this issue, sounds like the syscdr db is not getting created on the Sub. I would request you to run the following reports in the Unified Reporting Tool GUI page and attach them here so I can see what's going on:
Unified CM Cluster Overview
Unified CM Database Status
Also, kindly attach the latest output of:
++ "utils dbreplication runtimestate", "utils diagnose test", "show tech network hosts", "utils service list" from the Pub
++ "utils diagnose test", "show tech network hosts", "utils service list", "utils network connectivity" from the Sub.
Regards,
Harmit.
11-25-2012 05:10 AM
11-25-2012 05:11 AM
11-25-2012 05:12 AM
Hi Harmit,
Remianing attachments.
Regards,
Fawaz.
11-25-2012 05:37 AM
Hi Fawaz,
Thank you for the logs. From the gathered info, I can see that everything is good and healthy. The one thing that stands out is that there is a connectivity issue between Pub and Sub. You can see this in the "utils diagnose test" and "utils network connectivity" outputs:
test - validate_network : Error, intra-cluster communication is broken, unable to connect to [10.11.10.235]
admin:utils network connectivity
This command can take up to 3 minutes to complete.
Continue (y/n)?y
Running test, please wait ...
.
Test failed with AUCCM01: Could not send/receive UDP packets.
The strange thing is that in the reports you uploaded, I see that the connectivity test is successful:
This shouldnt be the case if the CLI for the same test failed. Not sure if you ran any commands in between whether the connectivity was temporarily down, or if there is a potential network issue between the nodes where it keeps going down and coming back up.
You can also see that the runtimestate reflects the following:
PING REPLICATION REPL. DBver& R EPL. REPLICATION SETUP
SERVER-NAME IP ADDRESS (msec) RPC? STATUS QUEUE TABLES L OOP? (RTMT) & details
----------- ------------ ------ ---- ----------- ----- -------- ---- -----------------
AUCCM01 10.11.10.235 0.048 Yes Connected 0 match Yes ( 3) PUB Setting Subs
AUCCM02 10.11.10.236 0.262 No Active-Dropped 0 ? No (
Active-Dropped means the Cluster Manager denying access or the DB is down or entire server is down. Since Cluster Manager is responsible for keeping a check on the network connectivity (which is failing), that is the reason why it shows this replication status.
Can you confirm if both the Pub and Sub are connected to the same switch and are in the same location? What is the MTU for both these servers? Default is 1500. You can confirm by running "show network eth0 detail" on the Pub and "
show network failover" on the Sun since fault tolerance is configured on it. By the way, why do you have it enabled on the Sub and not the Pub? Ideally, you want to keep this consistent for all nodes in the cluster. Until this connectivity issue is sorted out, till then, no matter what command you run, the replication wont stay up.
Hope the above helps.
Regards,
Harmit.
11-25-2012 06:29 AM
Hi Harmit,
Thanks for such a good explanation.
Yes both the servers are connected to same switch and are in the same location. I have not changed the MTU values while server installation. I have attached the outputs of publisher-show network eth0 and subscriber-show network failover.
I have not enabled fault tolerance anywhere.
So how do I proceed further? Any advise?
Regards,
Fawaz
11-25-2012 09:45 AM
Hi Fawaz,
Thanks for the info. I might have gotten confused with something you mentioned earlier then.
++ Can you check to see and make sure there is no speed/duplex mismatch between the servers and the switchports? Kindly pull "show network eth0 detail" output from both Pub and Sub. The Pub's output was not detailed.
++ Check for any interface errors on the switchports where the 2 servers are connected.
++ Try restarting the cluster manager server: "utils service restart Cluster Manager" on both servers.
++ Try disabling CSA on both servers. The command is "utils csa disable". Please note: The server will reboot when you disable CSA:
admin:utils csa disable
Need a system restart for changes to take effect
Enter "yes" to continue and restart or any other key to abort
:
After doing so, run "utils diagnose test" and "utils network connectivity" from the Sub and upload those outputs. We need to make sure the connectivity issue is sorted out before we attempt to fix the replication.
Regards,
Harmit.
11-26-2012 01:28 AM
Hi Harmit,
Thanks for the reply.
I have worked with TAC and resolved it. There was two issues actually.
1) The security password given in Publisher and Subscriber during the installation were different. This passwords are typed in by the customer. They do not share that details with us. TAC found that the hash value for the security password in the root login of both the servers were different. So we changed both the security passwords.
2) There was a NTP issue. The upgrade on the servers were done on lab network and we had given a different ntp address. When it came to the production we changed the ntp address to a real one. So the TAC found that there was NTP certificate issue. It was also later resolved by the TAC.
Now the subscriber is up and all the calls are going. There is another issue now. All the phones are registered but the status in the Device>Phone are as unknown. Both the IP address and their status are unknown. But no problems with any of the calls. Need to look into that later today.
Regards,
Fawaz
11-26-2012 01:45 AM
Hi Fawaz,
Great info! Thanks for sharing (+5 for doin so)!
I would have ideally needed access myself to resolve this for you, good thing you opened a TAC case for it. For the NTP issue, what I can tell you is that if there is a time difference of over 30 minutes between the nodes, the replication will go down.
For the other issue, sounds cosmetic in nature. Please go to the Unified Serviceability page --> Tools --> Control Center - Network Services --> Restart "Cisco RIS Data Collector" service on both nodes and check to see if the phone status shows up correctly now.
HTH.
Regards,
Harmit.
11-26-2012 06:59 AM
Hi Harmit,
Thanks for the reply.
I did the Cisco RIS collector service restart pn both the nodes. Still it was not fixed, then I raised another TAC and after his analysis he said that its a bug. Bug No: CSCtd91156.
Regards,
Fawaz.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide