Hi, wanted to know for those Tidal clients that have upgraded to TES 6.1 and using two client managers:
I logged a case about multi data center DR - and I was told that this version and earlier ones assume that failover architecture is all on the same data center. That 6.2 may address multi datacenter architecture.
We have been running on two data center since 5.3 (making sure we had redundant components in each data center) so I was suprised to learn this - I was not the Tidal admin when we instituted our failover environment in 5.3 so I don't have the background history.
In the 5.3 environment with just db amd master (and no client manager to contend with)- we do not notice a difference in performance when master was not on same data center as our database. So we are hoping it won't be an issue in 6.1. But every client is different and unique so I can't really say what works or didn't work for us will be the same for you.
Because our TES 6.1 DEV architecture is only using one data center and we are already seeing performance issues from time to time but cannot figure out the source, it has become more important for us to know if multi datacenter will aggravate issue even more. Also looking into trying out tools like appdynamics or new relic to help us determine where the slowness is consistently stemming from.
The RPC errors continue to be a head scratcher - dunno if our Load balancer (GSS) is the culprit, or simply having two CMs, or the combination of both CM and GSS or the fact that we're using VMs or that we're using clustered database or that we're under sized etc- the variables are endless. Will try to test that more formally in the next few weeks.
I have two comments about multi-data center DR.
I know what you mean. The current TES fault tolerance architecture is mainly for server component failure apparently. Since our datacenters are in the same city a few miles apart and we have a robust pipe between the two of them we've never had issues with our other systems and on 5.3. I can imagine this not workable for cross country datacenters (which is the DR best practice).
We have just finished setting up our PRD environment where I am doing practice upgrade passes on - and that is where I will do DR testing. Taken pains to disable all jobs and agents, since it is mainly the CM latency I am concerned with. We also have Oracle cluster on one DC and a standby cluster on another. I will compare difference in latency when all components are on same DC vs when they are not. We also need to test for when a DC becomes unreachable. Systems with redundant architecture should continue to function on one DC. So I put FM where backup master is, and put primary master by itself - since backup master cannot come up without FM, but Primary can if you take FT OFF (tesm command option). We plan on having Cm1 on DC1 and Cm2 on DC2. So I will see how it goes next week. Actually I have it up like this now, just haven't done the formal stopwatch comparisons yet. If it is really bad, I will then have to redo plan so that everything is on one datacenter most of the time and when we swtchover to the other datacenter, we switch over everything. Will also mean that I can only have one CM active - which was not something I orignially planned on. This impacts database and OS patching strategy.
Our performance issues in 6.1 are mainly with navigation (though during load test we notice queueing on master server ). As mentioned before in 5.3 we've already used the two masters in separate datacenter configuration with no issues. Hopefully 6.1 is not too different - well, actually hoping it will be better.
Ah, forgot to add - to Tracy's question about FT architecture and what the value is if it is only for 1 datacenter. Even though we use it for two separate datacenters, the value for us is that during our monthly server/OS patching, maintenance, we are still able to maintain a 24 x 7 availability for the master because of the FT feature. We just make sure that FM and both masters are not patched at the same time. It is even more important when we move to Linux where patching downtime takes an entire hour.
But I definitely agree that fault tolerance value extended with multi datacenter architecture for DR purposes. Ofcourse, even if Tidal has multi data center DR, if the most apps running the jobs don't then it's moot - that is why most our mission critical apps have to have full redundancy across datacenters.
Hi Carolanne, We are in the middle of upgrading TES from 5.3.1 to 6.1. We have set it up in lower environments, and are testing the upgrade. We also have similar set up (PM, BM, FM, 2 CMs, load balancer). I’m still looking into options on how to set up the DR. Have you installed a separate CM for DR purposes? I’m interested to know about your DR set up. Can you share your experiences and recommendations.
We are still in the middle of testing the two CMs in two datacenter setup. Again since this is not a supported configuration I really can't give an advice other than what I have already shared. I am sharing my experience, hoping that other who have figured this out ahead of us would share too.
Once I have something more conclusive, I will share.