12-10-2009 05:08 AM - edited 03-15-2019 08:45 PM
We have 4 CUCM in PUB-SUB scenario
1 -PUB and 3 SUBs with cucm 6.0
everything was working fine but from last few days we are facing an issue,
Phones registered with Subscriber A can communiate with phones on Subscriber B and C both
Phones registered to Subscriber B can commnicate with subscriber A but not Subscriber C
Phones registered to sub Subscriber C cannot communicate with subscriber B but they can communicate with subscriber A
There are times when subscriber C phones can communicate with Subscriber B phones but its very rare.
I have verified DB replication status and its 2
I have tried to insert phones from subscriber C Admin page and it works.
There no latency or packet drops issues between Subscriber to Subscriber and Publisher to subscriber.
Any suggestion to solve the issue.
Solved! Go to Solution.
12-10-2009 10:02 AM
If replication is fine you might need to look at SDL layers to find out the issue.
Look for version mismatchs or errors in the app log
HTH
java
If this helps, please rate
www.cisco.com/go/pdihelpdesk
12-10-2009 04:33 PM
This confirms what Java was referring to earlier. You have a DB replication issue within the cluster. Basically, while the SDL link to that node is out of service - the replication for the entire cluster is considered bad. With the Linux appliance, I've found that a reboot typically works but depending on how long the issue has been going on and the trigger for problem, it may not. You'll need to take a look at the utils dbreplication commands. There are some that you run on each server first, see what happens, and then if that doesn't work then you can initiate a repair operation from the publisher server. This is all done via CLI. Note that the repair operation may (and likely will) take quite a while to complete. Once you start it, let it run and wait for it to complete. Then use RTMT to verify DB replication. You can also use command line operation on each server or the Unified Reporting. Personally, I prefer RTMT.
If you need further info, just shout.
12-10-2009 06:24 AM
Sounds like SDL issues, i would recommend a cluster reboot to resync the SDL
HTH
java
If this helps, please rate
www.cisco.com/go/pdihelpdesk
12-10-2009 09:06 AM
hi,
java thanks for your reply, i will let you know after cluster reboot.
Regards,
Iftikhar Ahmed
12-10-2009 09:57 AM
Hi,
I have rebooted the cluster, publisher first then all subscribers.
Problem still persists.
Any further suggestions?
12-10-2009 10:02 AM
If replication is fine you might need to look at SDL layers to find out the issue.
Look for version mismatchs or errors in the app log
HTH
java
If this helps, please rate
www.cisco.com/go/pdihelpdesk
12-10-2009 10:05 AM
hi,
how can i enable the logs? or you asking me to enable trace for that?
Regards,
Iftikhar
12-10-2009 10:15 AM
Hi,
Since the problem started, I am getting this error message in application logs
Dec 10 23:04:19 CMSUB-ISL-IPT Error Cisco CallManager : 26: Dec 10 18:04:19.6 UTC : %CCM_CALLMANAGER-CALLMANAGER-3-SDLLinkOOS: SDL link to remote application out of service. Local node ID:12 Local Application ID.:100 Remote IP address of remote application:10.100.200.12 RemoteNodeID:1 Remote application ID.:100 Unique Link ID.:12:100:1:100 Cluster ID:StandAloneCluster Node ID:CMSUB-ISL-IPT
any suggestion?
12-10-2009 04:33 PM
This confirms what Java was referring to earlier. You have a DB replication issue within the cluster. Basically, while the SDL link to that node is out of service - the replication for the entire cluster is considered bad. With the Linux appliance, I've found that a reboot typically works but depending on how long the issue has been going on and the trigger for problem, it may not. You'll need to take a look at the utils dbreplication commands. There are some that you run on each server first, see what happens, and then if that doesn't work then you can initiate a repair operation from the publisher server. This is all done via CLI. Note that the repair operation may (and likely will) take quite a while to complete. Once you start it, let it run and wait for it to complete. Then use RTMT to verify DB replication. You can also use command line operation on each server or the Unified Reporting. Personally, I prefer RTMT.
If you need further info, just shout.
12-10-2009 04:48 PM
Hi,
I have started repair process for all node, I am still wondering why its dbreplication issue if RTMT is sying that dbreplication status is 2 for all servers?
I will let you know once db replication is done.
Thanks for replying to my queries.
Regards,
Iftikhar AHmed
12-10-2009 07:20 PM
EXACTLY what was the procedure you followed for the cluster reboot???
There is a method to perform a cluster reboot
HTH
java
if this helps, please rate
www.cisco.com/go/pdihelpdesk
12-10-2009 09:39 PM
Hi,
It seems to be working so far after dbreplication repair on all nodes,
As far as cluster reboot is concerned, I rebooted the publisher first and when it was online again i rebooted the subscriber one by one.
What is the correct way of reboting cluster by the way?
Thanks to both of you for support.
Regards,
Iftikhar
12-10-2009 11:50 PM
Hi,
I am observing same problem again,
Again i am observing same error in RTMT application log
Dec 11 12:44:27 | CMSUB-ISL-IPT | Error | Cisco CallManager | : 62: Dec 11 07:44:27.901 UTC : %CCM_CALLMANAGER-CALLMANAGER-3-SDLLinkOOS: SDL link to remote application out of service. Local node ID:12 Local Application ID.:100 Remote IP address of remote application:10.100.200.11 RemoteNodeID:3 Remote application ID.:100 Unique Link ID.:12:100:3:100 Cluster ID:StandAloneCluster Node ID:CMSUB-ISL-IPT |
after db replication repair it was solved but now after few hours I am facing same issue again.
Any comments.
Regards,
Iftikhar Ahmed
12-11-2009 12:03 AM
One more thing everytime when problem starts i see following in application log just before SDL error
Dec 11 12:01:01 | CMSUB-ISL-IPT | Notice | logrotate | ALERT exited abnormally with [1] |
Is it something that is creating issue?
Regards,
Iftikhar Ahmed
12-11-2009 06:33 AM
On the surface, I would say that the logrotate alert is likely not related to an SDL issue. However, if I were you - I would run a battery of tests to verify that you dont have an issue at the network - either logical or physical. For some reason, the Pub either can't or thinks it can't communicate at the SDL layer to this particular subscriber. So, a lot of what you would need to test would depend on topology and etc. You may also have a bad NIC or failed teaming configuration - there are lots of possibilities. Java and I have touched on the primary fixes - 1 being reboot and 2 being repair replication if all else fails. Some things to look for is a bad NIC, packet loss over the network, QoS, etc. If you can rule those things out, then like Java said earlier you may need to start digging into the SDL layers. In your case, I don't know your experience level but I would recommend that you open a TAC case with Cisco. Other factors could be if this is a subscriber that was recently added to the cluster or if there was a recent upgrade on the nodes that may be causing an issue.
Step 1) TAC case
Step 2) Examine potential network-related issues
Step 3) Goes with #2, try to rule out any hardware issues such as bad NIC
That's my best at the moment.
12-11-2009 09:59 PM
I would recommend you to run thea test on a server to check all SDL link or connectivity issue
Command :- Utils dialgnose test
Check if the test fails at certain point send me the output of the test .
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide