Tidal Stability on Windows vs. Linux

Amir Jalali · ‎08-20-2013

Hi all,

We are currently implementing an install of Tidal 6.1 on Windows 2008 R2.

There have been a few hiccups in implementation in terms of stability and reliability.

I would like to know if anyone has experience with Tidal on both Windows and Linux and could give a comparison.

Also does anyone know which platform Tidal was initially developed for?

Thanks for the help,

Amir

Carolanne Fougerat · ‎08-21-2013

I have logged a case with Cisco trying to find out how many clients are already using TES 6.1 in a PRD setting specifically because of the same concern. I wonde too if not many client have moved to 6.1 if they will re-extend the deadline for 5.3 support.

We are in the middle of testing TES 6.1 in Linux having been on 5.3 on Windows. We are for the most part stable on 5.3 but our experience on 6.1 has been a mixed bag (some teams I do not hear complaining too much, but teams that edit jobs at high frequency really are seeing the pain). Given the frequency of bug releases and number of open case we have I cannot feel comfortable going live in PRD until we can get our testing environment stabilized. Additionally, the unexplained slowness in doing some things really is a head scratcher.

I do know Tidal was initially developed for Windows. But from what I was told they are now supposed to be good on either Win or Unix.

Amir Jalali · ‎08-21-2013

Thanks for sharing your experience Carolanne.

I'm not sure what issues your team is experiencing but we are currently working on a problem where entire days go missing from the schedule.

It would be helpful if the community was a bit more active (and I share some of the blame) so we could learn from one another's experiences.

The reason we are pushing for the upgrade is because we need to use Tidal with SQL Server 2012 which is not supported by 5.3.

Please keep us posted on your implementation.

Lumi Mihalcea · ‎08-22-2013

My environment is small, 1 master, 1 CM on Win server 2008 R2 using Oracle ver 6.1.0.279. There are 2 things I would like to bring up: we used to be extremely slow with just 2 users on at the same time (more than 2 meant you can walk away and get coffee) but I had a webex with one of the best engineers and she helped tweaking the dsp and props files; the difference is night and day. It makes such a difference after you fight with it day after day for few months. Anyway, second point is that I reported a bug about missing future schedules, we do 7 days ahead. So far, to me it seems to happen once a month. The log says the day compiled but it's not there, yet if you look at a daily job's history, you can see that the job is scheduled for that day.One thing I tried was take down CM and bring it back up; another things I did was wait for the next day and then 2 days show up, the missing day and the new compiled one. I sent TAC logs and the ticket is still opened, they just have an issue replicating it.

Carolanne Fougerat · ‎08-22-2013

Speaking of missing schedule, I have encountered this one time. How I uncovered it was interesting actually. User was complaining that when they insert a job, the efffective dates they are given in the prompt didn't look right. Ofcourse when I tried to replicate, I couldn't.

Then it occured to me that since we had two CMs to try it on each CM. Found out that one CM (where I couldn't replicate) had all the schedules in it. The other CM (where the problem was) had missing days in the schedule.

The implications of this was kind of sobering really given my discomfort from the beginning of not really being able to see the true master database status with the CMs (unlike in 5.3 where what you see IS the master database in the GUI). It really dawms on you the amount of work this architecture has to do to keep everything in sync and what happens when it is not.

I ended up rebuilding the database schema objects on the CM that had the issue since bouncing it didn't fix the issue. Then missing schedule showed up after that. OFcourse it took 2.5 hours to rebuild TES schema.

Joe Fletcher · ‎10-07-2013

Hi,

I actually run TES on a mixture of Solaris and linux but I don't expect it makes too much difference (all flavours of unix after all) having a multi-platorm setup. The critical things with TES6.x seem to centre around the tuning of the java components.

The client manager is a hungry beast. It barely starts in less than 8Gb RAM and if you have just a few concurrent users then start at 64Gb and go up from there. Because of the way data is now cached by the CM (either via the embeded Derby DB or an external application database) it is critical to have very good links between the CM and the master database. In geographically dispersed environments the recommendation is to use a local external caching database for best performance but you still need to remember that updates between the cache and the master DB will be subject to whatever latencies exists between the two. Anything much lower than decent LAN speeds will lead to very patchy performance, especially in complex job environments with lots of updates taking place.

Its also very sensitive to browser specifics. Anything lower than IE9 pretty much won't work (as per the product notes) but IE10 does seem to help with some memory leak issues that we've observed. Firefox is also very picky with some features working in one point release then failing in the next.

Its certainly far more critical with T6 to get the underlying infrastructure correct and highly optimised than it was with 5.3.

Cheers

Carolanne Fougerat · ‎10-10-2013

Hi Joe - all great points.

Question about your environment, is your GUI nagivation experience pretty stable performance wise? Do you have times in the day doublclicking a job to edit take longer than 4 - 5 seconds? How about scrolling down and filtering or expanding job groups? How many max concurrent users did you plan for, and what is your CPU count on master and CM? Also how many jobs do y'all have total daily? Just getting idea on how folks are sizing theirs systems.

Joe Fletcher · ‎10-10-2013

I find that with a very locally concentrated setup ie master, CM and client browser on the same network its generally good. However when I've asked our remote offices to test they've said its extremely slow. I've not been able to set up the properly dispersed rig ie local caching database at the remote site. Its likely we'll try to side-step that using some kind of Citrix type solution. In effect we'll serve out the browser app via some kind of terminal services so we're keeping the major components as proximal as possible.

Daily job count is currently in the low thousands but that is steadily increasing.

Carolanne Fougerat · ‎10-16-2013

Thanks for your response - I agree that a Citrix or RDP/Remote App solution is better than a remote CM given that CM and Master communicatino latency will be an issue. We are also telling that to our regional folks. And apparently a third CM is not recommended anyway.

jpforums2 · ‎10-22-2013

We have master and client manager on one datacenter, but we are planning to install second client manager on a different datacenter. Did any one of you have this kind of set up? How is the performance? I assume the users on the remote datacenter should not have any performance issues as they have a separate client manager there. But I'm not sure if master and client manager sync latencies might cause performance issues. Please share your experiences and any recommendations.

John

Carolanne Fougerat · ‎10-24-2013

Since we are already seeing sync issues on two CMs with everything just set up in 1 datacenter in our DEV instance where there are a lot of users in the system- its quite a challenge for us to know what will happen when one of the CM is on another datacenter. There may also be issues inherent to having more than one CM that we are also trying to understand from Cisco support since they are head scratchers.

We have the two CMs on two datacenters setup (in my PRD upgrade practice instance - but since that instance is a copy of my 5.3 PRD database, I had to disable all jobs and queie and agents). GUI navigation seems OK on both CM, but then again I am the only one using that instance and there is no job activity.

I will let you know once we conclude our DR testing but again, this multi DR setup is not supported by Tidal and what may work or not work for us will be different for your site.

jpforums2 · ‎10-24-2013

Regarding the sync issues between the 2 CMs in one datacenter, If you have not done already, you might check if you have your load balanacer session persistence set to "sticky"

John

Carolanne Fougerat · ‎10-25-2013

'sticky sessions' is in fact the only guidance I could give our network folk when setting up the load balancing. That is definitely a requirement - I was asking support if they had more information but they don't make it available. Are you using Cisco GSS or another load balancer? Were there any other configurations that you had to do?

I am currently working with network team on the Maintenance site redirect - for when webservers are down so that users don't just get a 404 error - so that they atleast get more information on the downtime and when to expect system to be available again.

Also on ways to make a webserver/CM not part of loadbalancer even when its up if we yank out a custom file that load balancer can't access via HTTP. This then can help us access webserver as admins but not allow users in during troubleshooting etc.

Do you get the RPC error a lot regardless of whether its IE or FireFox?

Carolanne Fougerat · ‎10-25-2013

The sync issues I was talking about when these very rare instances when one CM doesn't show all the compile days that the other does. Also is your DSP named the same on both CMs? We actually tried it both ways, first naming them differently so that users can wasily tell which one they were on for troubleshooting - but that makes load balancing not work on sacmd etc.

jpforums2 · ‎10-25-2013

We observed some other sync issues in the job output log, even though the session was set to sticky. We then realized that the property for session persistence configuration was not enabled for Java, and was enabled for only ASP pages. when we set the property enabled for Java, the other sync issues we observed were resolved. As we are not into rigorous testing yet, I have not encountered other issues like missing the compile days on CM etc. We are waiting on a patch for a bug fix from Cisco, Once it is applied, I will start testing again.

Yes, we are noticing the RPC call failed messages regardless of IE or firefox.