Solved: Cluster over WAN for ICM version 8

bilalghayad · ‎12-05-2011

Hi All;

Now I built side A (RGR, HDS, CVP PG and CUCM PG) and I got a new servers to build side B. I am going to install the ICM on side B. Regarding to the databases (HDS, AW, logger), just I have to create them on side B, and connect side B and it will synch with side A? Or there is something else I have to do?

Side A and side B linked by WAN, but it is a reliable fiber link of 1 Gbps bandwidth and very stable.

Thanks for the help in advance.

Regards

Bilal

geoff · ‎12-05-2011

Logger B will synch with logger A.

The HDS on side B will try to catch up, but given the retention time of a logger (typically 14 days) it may never be the same as the HDS on side A.

The AW is built and destroyed on demand - it's always accurate.

Regards,

Geoff

View solution in original post

geoff · ‎12-05-2011

The old data that before the 14 days, this already migrated from logger to HDS.

The really old data has migrated from Logger A to its HDS. But it's gone from Logger B so it can't be on its HDS.

Why did you take so long to bring up the second HDS?

It will synchronize either way - you should have tested this by now.

Test 1: LoggerA, RouterA, RouterB, LoggerB

Test 2: LoggerB, RouterB, RouterA, LoggerA

Regards,

Geoff

View solution in original post

geoff · ‎12-12-2011

This can be fixed. I am reluctant to give advice on the forum.

This came up once before

https://supportforums.cisco.com/thread/271063

My advice there was to open a TAC case. They will help you and this must be done correctly.

Regards,

Geoff

View solution in original post

geoff · ‎12-05-2011

Logger B will synch with logger A.

The HDS on side B will try to catch up, but given the retention time of a logger (typically 14 days) it may never be the same as the HDS on side A.

The AW is built and destroyed on demand - it's always accurate.

Regards,

Geoff

bilalghayad · ‎12-05-2011

Dear Geoff;

The old data that before the 14 days, this already migrated from logger to HDS. So, how HDS of side B will get this old one?

About B synch A, is it always B synch A, or it is because A already live so B will synch A? In other words, if B was live and A joined, so A will synch B?

Regards

Bilal

geoff · ‎12-05-2011

The old data that before the 14 days, this already migrated from logger to HDS.

The really old data has migrated from Logger A to its HDS. But it's gone from Logger B so it can't be on its HDS.

Why did you take so long to bring up the second HDS?

It will synchronize either way - you should have tested this by now.

Test 1: LoggerA, RouterA, RouterB, LoggerB

Test 2: LoggerB, RouterB, RouterA, LoggerA

Regards,

Geoff

bilalghayad · ‎12-12-2011

Dear Geoff;

What happened actually that we are building a disaster recovery to be over WAN. So we started our work in the site that will be a disaster recovery for the production site and we built one leg of the IPCC in this Disaster Recovery site.

In the Disaster Recovery site (DR), we built first leg (site A, CVP Call Server, CVP VXML, CUCM Publisher, CUCM Subscriber) and it is running well.

Coming back now to the main site, I got a fresh new 3 servers, so I decided to use them for RGRB, CMPGB, VRUPGB and will synch them with what I did in the DR (which has site A), again until now I am not touching the production servers.

All what I need now is to be check that the cluster over WAN will work fine, so I installed and configured RGRB, CMPRGB and VRUPGB.

My questions here:

1) For the Logical, Physical and Peripheral IDs in side B (CMPGB and VRUPGB), I have to use the same one in side A, correct?

2) CG and CTIOS Server, same ports that I used in side A I am going to use it for side B?

3) About HDS: I can backup/restore from HDS at side A to HDS at side B?

Any special advise for this scenario?

Fully thanks for kindly support.

Regards

Bilal

bilalghayad · ‎12-12-2011

Dear Geoff;

What happened actually that we are building a disaster recovery to be over WAN. So we started our work in the site that will be a disaster recovery for the production site and we built one leg of the IPCC in this Disaster Recovery site.

In the Disaster Recovery site (DR), we built first leg (site A, CVP Call Server, CVP VXML, CUCM Publisher, CUCM Subscriber) and it is running well.

Coming back now to the main site, I got a fresh new 3 servers, so I decided to use them for RGRB, CMPGB, VRUPGB and will synch them with what I did in the DR (which has site A), again until now I am not touching the production servers.

All what I need now is to be check that the cluster over WAN will work fine, so I installed and configured RGRB, CMPRGB and VRUPGB.

My questions here:

1) For the Logical, Physical and Peripheral IDs in side B (CMPGB and VRUPGB), I have to use the same one in side A, correct?

2) CG and CTIOS Server, same ports that I used in side A I am going to use it for side B?

3) About HDS: I can backup/restore from HDS at side A to HDS at side B?

Any special advise for this scenario?

Fully thanks for kindly support.

Regards

Bilal

geoff · ‎12-12-2011

1. Yes, of course.

2. Normally, CG1A listens on 42027 and GC1B listens on 43027. CTIOS listens to the same port on each side for the clients - and the cross-connection between each.

3. Not normally, but there is a document in the Wiki about doing that.

https://supportforums.cisco.com/docs/DOC-15290

9. From the HDS2 server, import HDS1 database over to HDS2 (use the database copy from LoggerB's local machine).

10. Run the following query against HDS2 database.

truncate table Recovery

11. Start HDS2 services. Allow enough time for data to get replicated (Logger -> HDS).

12. At the end of this exercise, verify BOTH max recovery keys and max DateTime match between LoggerA, HDS1, LoggerB, HDS2.

SQL Command: max(DateTime), and max (RecoveryKey).

Regards,

Geoff

bilalghayad · ‎12-12-2011

Dear Geoff;

I am afraid that I am facing the problem because I created the sideB database fresh, so it is not able to synch and that causing a problem related to the LastUpdateKey as I will explain below, but how I can overcome this?

Now I configured RGRB, CMPGB and CVPPGB and all is connected and I started services.

Let us talk about RGRB and RGRA:

Actually when I installed RGRB, I created a database using ICMdba, so sideB database is empty.

At the Router A mdsproc, I saw the message that:

Communication with peer Synchonizer established.

Synchonizer switching to active duplex operation.

Logger A config logger is giving that that (and this is maybe because I created a fresh database in side B):

Synchronization failed because the LastUpdateKey on sideB is not present in the sideA Config Message Log.

As shown in the below picture:

Also at RGRB, I am noticing that LoggerB configlogger is keep restarting and its snap shot as following:

Also, at RGRB mdproc I see the below message:

Regards

Bilal

geoff · ‎12-12-2011

This can be fixed. I am reluctant to give advice on the forum.

This came up once before

https://supportforums.cisco.com/thread/271063

My advice there was to open a TAC case. They will help you and this must be done correctly.

Regards,

Geoff

bilalghayad · ‎12-13-2011

Dear Geoff;

I followed the link and it is fixed at RGRB.

Now I am facing a problem with CUCM PG and CVP PG, actually it is not becoming UP and giving that network name is no longer available.

Could be because I have until now one CVP Call Server and One Publisher and One Subscriber so both side (A and B) are connecting to the same CVP Call Server and same Subscriber and using same Peripheral ID and same Physical ID and same Logical ID, and same PIM number?

Regards

Bilal

bilalghayad · ‎12-13-2011

Dear Geoff;

This is to confirm that it is working, but it worked when I stop services at side A.

Actually VRU PG at side B which is configured to connect with the same CVP Call Server that side A is connecting, when side A is up and I am trying to start VRU PG at side B, then a failure happen and I see Windows OS message that it is going to restart after spcific amount of minutes and seconds.

Could this because they are both (VRU PG A and B) are connected to the same CVP Call Server? But when side A down, and I start side B, then no problem and things are going fine.

I do not face the restart problem at CUCM PG of side B if side A is UP also, but services are not becoming UP and it is giving that specific network name is no longer available as shown in the above picture in the previous post. Is it normal thing? Or because connected to the same CUCM?

Regards

Bilal

geoff · ‎12-13-2011

Could this because they are both (VRU PG A and B) are connected to the same CVP Call Server?

That's exactly how it's supposed to be.

Both A and B connect to the CVP Call Server which is listening on port 5000. Of course, they don't try to connect at the same time - the PIM is overseen by the PG process, and through the MDS they talk to each other, and they know which side should really bind to the CVP Call Server (the active side).

JTAPI is different in a production environment - where the JTAPI gateway on PIM A talks to the CTI Manager on 1 subscriber; and the JTAPI gateway on PIM B talks to the CTI Manager on a different subscriber; the PG controls the active side from above.

Regards,

Geoff

bilalghayad · ‎12-13-2011

Is it possible to start VRU PG at side A and side B at the sametime and trying to connect to the same CVP Call Server?

Because if side A is active and if started VRU PG at side B, then the server give Windows message that it will restart after a specific minutes and seconds and ofcourse the service VRU PIM at side B will be stopped. That is why I am asking if I can have VRU PIM at side A and VRU PIM at side B to be both connected to the same CVP Call Server at port 5000, or that is not possible? If it is possible, so why this is happening with me (the restart)?

Currently if I need to start VRU PIM at side B, I have to stop the VRU PIM at side A, otherwise the restart message will appear and the server will be restarted automatically after the specific time. This restart message remind me in the viruses that we show it sometime and it is restarting the machine

Thanks for the help in advance.

Regards

Bilal

geoff · ‎12-13-2011

You must have the PIM for the VRU PG on side A connect to the Call Server on port 5000 and the PIM on side B connect to the same Call Server on the same port, but only one will connect. That will be controlled by whichever PG is designated as the active side. If you have them both running, and bounce the connected PIM, the other one will bind to the port.

If you get the message that it is going to stop the server and restart it, type "stopshut" at a command prompt or in the "Run" box and it will stop the shutdown/restart process.

The fact that you are getting this means that something is not set up properly in the configuration of the PGs. Go over how you have set these up - look at the public and private connections, because you have something amiss.

Regards,

Geoff

bilalghayad · ‎12-14-2011

Dear Geoff;

Regarding to the CUCM PGs, if one side is active then I see the other side processes to be IDLE and the mdsproc to be OO. Is that normal or I have to see standby? But at the same time, I tried to stop the services for CUCM PG side A then I found that CUCM PG side B become active, also I tried to stop CUCM PG side B and I found that CUCM PG side A become active. But standby word I do not see it in the processes but I see Active and Idle while the mdsproc at the Idle side is OO (out of service) and at the Active side is InSvr. Is that fine?

About the VRU PG, I checked the configuration and the IP addresses for the Visible and Private for the low and high IPs and every thing is fine, also if VRU PG side A service is stopped then VRU PG side B can start and work fine and if VRU PG at side B is stopped then VRU PG side A can start and work fine. The only problem that when both VRU PGs of both sides (A and B) will be started at the sametime (and they are configured to connect to the same CVP Call Server), then the problem happens and the restart message appear. One thing I tried to change it in the configuration is the Peripheral ID (not the logical and not the physical), so I tried to use same Peripheral ID for VRU PG side B and A, also I tried to use another Peripheral ID for side B than this for side A (already there are 4 Peripheral IDs under the same VRU PG configured in the configuration manager, because in the production there are 4 CVP Call Servers). But the same problem is happening.

I investigated the problem of the VRU PG and I discovered the following scenario is happening:

If I started VRU side B (for example) and services become Active, then when I am starting VRU side A, the vrupim process is getting an error (very fast, I was not able to take snap shot) and it is stopping and the service disappear, and then an error at the mdsproc is appearing and it is saying that there was an error in the vru process and it is stopped, and then the services are stopping and the restart message appears. So, the first failing process is the vrupim (and this process should not be related to the other side configuration, am correct?).

Thanks for the help.

Regards

Bilal