Solved: DBreplication problem; All servers do not have a replication count of 348.

smsangola · ‎11-18-2010

Hi all,

I am encountering a problem with replication between these two servers.

All servers do not have a replication count of 348.

All servers have a good replication status.

Server	Number of Replicates Created	Replicate_State
10.182.9.227	0	2 - good
10.182.9.228	348	2 - good

Publisher:

admin:show perf query class "Number of Replicates Created and State of Replication"
==>query class :

- Perf class (Number of Replicates Created and State of Replication) has instances and values:
ReplicateCount -> Number of Replicates Created = 0
ReplicateCount -> Replicate_State = 2

Subscriber:

admin:show perf query class "Number of Replicates Created and State of Replication"
==>query class :

- Perf class (Number of Replicates Created and State of Replication) has instances and values:
ReplicateCount -> Number of Replicates Created = 348
ReplicateCount -> Replicate_State = 2

Can anyone help me out? I am having lots of problems with this cluster and unable to add phones, backup, upgrade. I am suspecting this replication is the main cause of everything.

I tried the utils dpreplication repair all command but no changes.

I just typed the 'utils dbreplication stop' command and waiting for it to finish (requires between 5 and 60 mins) then I'll do a 'utils dbreplication reset' to see if anything changes.

Thanks.

Clifford McGlamry · ‎11-19-2010

Don't do anything just yet.

Wait until they both go to 4. If that happens, do a dbreplication stop on both servers, drop the dbadmin again on the subscriber, and then to the cluster reset again.

But what you are seeing often will happen, but then everything will snap to a 2. Wait till they are all 4's or all 2's before you do anything.

View solution in original post

smsangola · ‎11-18-2010

Attached is the report generated from Unified CM Database Status.

smsangola · ‎11-18-2010

Nothing worked.

I tried also the utils dbreplication dropadmindb, then stop then repair.. but nothing happened as well.

Clifford McGlamry · ‎11-18-2010

You may need to do the dropadmindb followed by a cluster reset.

When you scroll down farther on the dbreplication report, do you have any other errors in the section below the one you pasted in on your first post? If those have issues, they need to be fixed first.

smsangola · ‎11-19-2010

Yes there are other problems. I have the report attached in the previous post in XML format. I am not on site now to provide a screenshot.

If you open the XML file and search for 'Dropped', you'll see the following output of 'cdr list serv'

cdr list serv

10.182.9.227 (the publisher)

SERVER                 ID STATE    STATUS     QUEUE  CONNECTION CHANGED
-----------------------------------------------------------------------



10.182.9.228 (the subscriber)
SERVER                 ID STATE    STATUS     QUEUE  CONNECTION CHANGED
-----------------------------------------------------------------------
g_cmngpucm1_ccm         2 Active   Dropped         0 Nov 18 14:59:49
g_cmngpucm2_ccm         3 Active   Local           0

kusatija · ‎11-19-2010

Hello,

Please try the following in the exact sequence mentioned below:


>> once it is stopped on the subscriber then utils dbreplication stop
on the publisher

 

>> wait for a few minutes for it to finish

 

>> utils dbreplication dropadmindb on the publisher

 

>> wait for it to finish

 

>> utils dbreplication dropadmindb on the subscriber

 

>> wait for a few minutes for it to finish

 

>>utils dbreplication reset all on the publisher

Give it some time and then run the following:

show perf query class "Number of Replicates Created and State of Replication"

on both the servers and post the results.

ALso post screen shots from the unified reporting from the call manager, once we have the screen shots, as well as the outputs, we can guide further.

HTH

Kunal

smsangola · ‎11-19-2010

Hi,

Thanks for the hints guys.

I ran the 'utils dbreplication clusterreset' after stopping the dbreplication on both servers.

And I also ran the dropadmindb and then reset all as mentionned above and I got the following below.

Right now, I have the the Replicate Count = 348 on both servers but the Replicate state = 4 on the Publisher and 3 on the Subsriber.

I did the reset all command 1 hour, and (supposedly) it's still working in the background now.

Attached are screenshots and the output of the 'utils dbreplication status'. And notice that there are 0 processed rows in every part of the output!

Now the problem moved to be the Replicate State = 4 on the Publisher.

What do you advise the next step to be?

Regards.

Clifford McGlamry · ‎11-19-2010

Don't do anything just yet.

Wait until they both go to 4. If that happens, do a dbreplication stop on both servers, drop the dbadmin again on the subscriber, and then to the cluster reset again.

But what you are seeing often will happen, but then everything will snap to a 2. Wait till they are all 4's or all 2's before you do anything.

smsangola · ‎11-19-2010

Hi Clifford,

Actually they became both 4 4 (with 384 Replicate created on both, they were 0-384 before). Then I restarted the publisher so the state stayed 4 on the publisher and went to 3 on the subscriber. So I tried the dropadmin again then reset all and nothing changed.

A question, why would I need to do a dropadmin and a clusterreset again? Wasn't it supposed to work from the first time?

Thanks a lot for your help.

Clifford McGlamry · ‎11-19-2010

It should have, but sometimes....things just don't behave correctly.

Usually this happens because the CDR is hosed. Dropping it and resetting it usually straightens it out. If it doesn't work after the second time, call TAC.

NOTICE OF CONFIDENTIALITY:

The information contained in this email transmission is confidential information which may contain information that is legally privileged and prohibited from disclosure under applicable law or by contractual agreement. The information is intended solely for the use of the individual or entity named above.

If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or taking of any action in reliance on the contents of this email transmission is strictly prohibited.

If you have received this email transmission in error, please notify us immediately by telephone to arrange for the return of the original transmission to us.

kusatija · ‎11-20-2010

Hello,

Needed to look at the unified reporting to check if the etc files and the sql files have the correct enteries, does like it could be an issue.

There are a lot things we can look at to be honest.

But would need access or a webex session to fix it, try to reboot the pub and the sub once off production to see if that fixes the issue.

If that does not fix it i would strongly suggest you to oprn a TAC case if this affecting production, so that we can take access and fix the issue.

Regards

Kunal

smsangola · ‎11-22-2010

Thanks a lot guys, after rebooting the pub waiting till its up and running and then rebooting the sub, the status is now 2 on both servers