Solved: Re: Phones registered to different subscriber cannot communicate

iftikharsyed_2 · ‎12-10-2009

We have 4 CUCM in PUB-SUB scenario

1 -PUB and 3 SUBs with cucm 6.0

everything was working fine but from last few days we are facing an issue,

Phones registered with Subscriber A can communiate with phones on Subscriber B and C both

Phones registered to Subscriber B can commnicate with subscriber A but not Subscriber C

Phones registered to sub Subscriber C cannot communicate with subscriber B but they can communicate with subscriber A

There are times when subscriber C phones can communicate with Subscriber B phones but its very rare.

I have verified DB replication status and its 2

I have tried to insert phones from subscriber C Admin page and it works.

There no latency or packet drops issues between Subscriber to Subscriber and Publisher to subscriber.

Any suggestion to solve the issue.

Jaime Valencia · ‎12-10-2009

If replication is fine you might need to look at SDL layers to find out the issue.

Look for version mismatchs or errors in the app log

HTH

java

If this helps, please rate

www.cisco.com/go/pdihelpdesk

HTH

java

if this helps, please rate

View solution in original post

David Hailey · ‎12-10-2009

This confirms what Java was referring to earlier. You have a DB replication issue within the cluster. Basically, while the SDL link to that node is out of service - the replication for the entire cluster is considered bad. With the Linux appliance, I've found that a reboot typically works but depending on how long the issue has been going on and the trigger for problem, it may not. You'll need to take a look at the utils dbreplication commands. There are some that you run on each server first, see what happens, and then if that doesn't work then you can initiate a repair operation from the publisher server. This is all done via CLI. Note that the repair operation may (and likely will) take quite a while to complete. Once you start it, let it run and wait for it to complete. Then use RTMT to verify DB replication. You can also use command line operation on each server or the Unified Reporting. Personally, I prefer RTMT.

If you need further info, just shout.

View solution in original post

Jaime Valencia · ‎12-10-2009

Sounds like SDL issues, i would recommend a cluster reboot to resync the SDL

HTH

java

If this helps, please rate

www.cisco.com/go/pdihelpdesk

HTH

java

if this helps, please rate

iftikharsyed_2 · ‎12-10-2009

hi,

java thanks for your reply, i will let you know after cluster reboot.

Regards,

Iftikhar Ahmed

iftikharsyed_2 · ‎12-10-2009

Hi,

I have rebooted the cluster, publisher first then all subscribers.

Problem still persists.

Any further suggestions?

Jaime Valencia · ‎12-10-2009

If replication is fine you might need to look at SDL layers to find out the issue.

Look for version mismatchs or errors in the app log

HTH

java

If this helps, please rate

www.cisco.com/go/pdihelpdesk

HTH

java

if this helps, please rate

iftikharsyed_2 · ‎12-10-2009

hi,

how can i enable the logs? or you asking me to enable trace for that?

Regards,

Iftikhar

iftikharsyed_2 · ‎12-10-2009

Hi,

Since the problem started, I am getting this error message in application logs

Dec 10 23:04:19 CMSUB-ISL-IPT Error Cisco CallManager : 26: Dec 10 18:04:19.6 UTC : %CCM_CALLMANAGER-CALLMANAGER-3-SDLLinkOOS: SDL link to remote application out of service. Local node ID:12 Local Application ID.:100 Remote IP address of remote application:10.100.200.12 RemoteNodeID:1 Remote application ID.:100 Unique Link ID.:12:100:1:100 Cluster ID:StandAloneCluster Node ID:CMSUB-ISL-IPT

any suggestion?

David Hailey · ‎12-10-2009

This confirms what Java was referring to earlier. You have a DB replication issue within the cluster. Basically, while the SDL link to that node is out of service - the replication for the entire cluster is considered bad. With the Linux appliance, I've found that a reboot typically works but depending on how long the issue has been going on and the trigger for problem, it may not. You'll need to take a look at the utils dbreplication commands. There are some that you run on each server first, see what happens, and then if that doesn't work then you can initiate a repair operation from the publisher server. This is all done via CLI. Note that the repair operation may (and likely will) take quite a while to complete. Once you start it, let it run and wait for it to complete. Then use RTMT to verify DB replication. You can also use command line operation on each server or the Unified Reporting. Personally, I prefer RTMT.

If you need further info, just shout.

iftikharsyed_2 · ‎12-10-2009

Hi,

I have started repair process for all node, I am still wondering why its dbreplication issue if RTMT is sying that dbreplication status is 2 for all servers?

I will let you know once db replication is done.

Thanks for replying to my queries.

Regards,

Iftikhar AHmed

Jaime Valencia · ‎12-10-2009

EXACTLY what was the procedure you followed for the cluster reboot???

There is a method to perform a cluster reboot

HTH

java

if this helps, please rate

www.cisco.com/go/pdihelpdesk

HTH

java

if this helps, please rate

iftikharsyed_2 · ‎12-10-2009

Hi,

It seems to be working so far after dbreplication repair on all nodes,

As far as cluster reboot is concerned, I rebooted the publisher first and when it was online again i rebooted the subscriber one by one.

What is the correct way of reboting cluster by the way?

Thanks to both of you for support.

Regards,

Iftikhar

iftikharsyed_2 · ‎12-10-2009

Hi,

I am observing same problem again,

Again i am observing same error in RTMT application log

Dec 11 12:44:27

CMSUB-ISL-IPT

Error

Cisco CallManager

: 62: Dec 11 07:44:27.901 UTC : %CCM_CALLMANAGER-CALLMANAGER-3-SDLLinkOOS: SDL link to remote application out of service. Local node ID:12 Local Application ID.:100 Remote IP address of remote application:10.100.200.11 RemoteNodeID:3 Remote application ID.:100 Unique Link ID.:12:100:3:100 Cluster ID:StandAloneCluster Node ID:CMSUB-ISL-IPT

after db replication repair it was solved but now after few hours I am facing same issue again.

Any comments.

Regards,

Iftikhar Ahmed

iftikharsyed_2 · ‎12-11-2009

One more thing everytime when problem starts i see following in application log just before SDL error

Dec 11 12:01:01

CMSUB-ISL-IPT

Notice

logrotate

ALERT exited abnormally with [1]

Is it something that is creating issue?

Regards,

Iftikhar Ahmed

David Hailey · ‎12-11-2009

On the surface, I would say that the logrotate alert is likely not related to an SDL issue. However, if I were you - I would run a battery of tests to verify that you dont have an issue at the network - either logical or physical. For some reason, the Pub either can't or thinks it can't communicate at the SDL layer to this particular subscriber. So, a lot of what you would need to test would depend on topology and etc. You may also have a bad NIC or failed teaming configuration - there are lots of possibilities. Java and I have touched on the primary fixes - 1 being reboot and 2 being repair replication if all else fails. Some things to look for is a bad NIC, packet loss over the network, QoS, etc. If you can rule those things out, then like Java said earlier you may need to start digging into the SDL layers. In your case, I don't know your experience level but I would recommend that you open a TAC case with Cisco. Other factors could be if this is a subscriber that was recently added to the cluster or if there was a recent upgrade on the nodes that may be causing an issue.

Step 1) TAC case

Step 2) Examine potential network-related issues

Step 3) Goes with #2, try to rule out any hardware issues such as bad NIC

That's my best at the moment.

vivkalra · ‎12-11-2009

I would recommend you to run thea test on a server to check all SDL link or connectivity issue

Command :- Utils dialgnose test

Check if the test fails at certain point send me the output of the test .