Re: IPCC - Design to Support DR - Page 2

dtran · ‎03-06-2008

Hello all,

Has anyone deploy IPCC enterprise to support DR ? if so, please share your exprience on your deployment !!!

I am having a hard time getting my IPCC environment to work as design. I have a duplexed IPCC 6.0 enterprise environment with side A and side B located at separate locations connecting via a full DS3 MPLS connection and I am having a hard time getting the IVRs to go active when PG1a is offline. When I shutdown PG1a the IVR PIMs on PG1b do not go active for some reason and I get a busy signal on all of my tollfree numbers.

Please share your experience if you have ran into an environment like mine !!! Thanks very much in advance !!!

Danny

dtran · ‎03-12-2008

Hello Jeff,

The A side is in Ca and the B side is in Tx and the latency between the two locations is within acceptable range (35-40ms). I did read the SRND and understand that Cisco recommended a private P-t-P link for the private network. But I am pretty sure other customers have done it successfully with my type of design, I still think there is a mis-configured setting somewhere.

Thank you very much Jeff !!! I really appreciate everyone taking time out of your busy schedule responding to my post !!!!

Danny

cherilynn1030 · ‎03-12-2008

I hope an answer is found for you. I'm watching this thread like a hawk.. We are experiencing our own headaches with heartbeats between our PG's.. The whole DR scenario just seems to be very flaky for us. I 'm hoping to gleen some knowledge off of this, for our own scenario...

dtran · ‎03-12-2008

Hello Cheri,

I understand you have a straight fiber link between side A and side B, if that is the case you might want to look into your QoS settings for the private network. Heartbeat should not be an issue on this type of connection.

My setup is quite simple but it's frustrating when things don't work right. I am reaching out for any help I can get and Jeff Marshall has been very helpful.

Thanks to everyone for responding !!!

Danny

jeff.marshall · ‎03-13-2008

'Watching like a hawk', well in that case I hope I'm not the field rodent! ;-)

Danny: you are correct; there are customers that deviate from the high availability designs in the SRND. Of course, don't admit that in public. Most are successful. Some (considering Murphy's Law) who have never had a WAN problem, suddenly see a major outage that takes down their call center for days (how much was that T-1?). Others just live with the inconvenience. I've worked with this product for too long to disrespect a 'by the book' design but I understand business and monitory pressures and concessions can be made. I think your issue is something else though. We'll get to the bottom of it.

Cherilynn: your concern seems to speak to a different issue. As I understand it, you have a normally functioning geographically split duplex PG pair that periodically disconnects. Is that correct? If by 'flaky' you mean 'not working as I thought it would in my mind' then we have some work to do and this audience is happy to help.

ICM is an incredibly robust and simple product at the core. Its design is centered on very basic networking needs that were available on the market 12 years ago and the only serious network-side enhancement has been the inclusion of QoS packet marking starting in v6.0 and enhanced in v7.0. The network has to be able to recognize that there are High, Medium and Low traffic streams to/from the real-time components of the system and (a) prioritize those streams correctly from the server and (b) prioritize them correctly across the network. There are some other basic failover caveats and conditions but if the network isn't right then nothing else will function as intended. To look at it another way, if the foundation of the house isn't straight, plumb and level then the rest of the house will never be true. These are all fixable issues but may require serious foundational adjustments and in some cases, investment.

/Jeff

P.S. I'm heading to VoiceCon next week, I hope everyone can stop by and say 'Hi' at the Dimension Data booth.

cherilynn1030 · ‎03-13-2008

I hesitate to say normally functioning. We've been live since 9/17/07. the longest we've gone without issues is 29 days. We've done 5 ES patches, had a bug named in our honor, and one ET patch. More TAC cases than I know what to do with. We seem to have heartbeat issues, yet network shows nothing even CLOSE to getting to that point. We have (and here's my term flaky) issues happening with our agents and CTIOS - Agent A, consult transfers a call to Agent B. But Agent B doesn't get the call, however gets connected to Agent C, who resided in same skillgroup. To say flaky, to both myself and our end users... yeah.

Our PG's are geographically split. 10 miles, with a fiber connection. We have separate VLANs, and the heartbeat is on it's own EOMPLS. I feel like we're all just grasping at straws, and the confidence factor of this solution to our business end is dwindling away fast.

jeff.marshall · ‎03-14-2008

Cherilynn,

Can you describe your installation a bit including software versions and product type (e.g. UCCE or System UCCE)?

I won't excuse ESs and Bugs but this is software and these things happen - I (and others here) have more than our share of these things attributed to our names. What is inexcusable is the dwindling confidence factor - there is no reason why someone should disbelieve in a system of this caliber. This is wrong and in my mind product issues are always fixable. Kudos to you in participating in this forum, that's a huge step to success in my mind. Not that this forum will guarantee success in solving your issues but it tells me that you are interesting in personally investing in success.

The challenge here is indentifying issues from the bottom up. If you are having DR issues then I consider that foundational and is either a design disconnect or a programmatic issue. Your consultative transfer issue is higher up - also fixable but the foundation is key to success.

What can we help you with?

/Jeff

cherilynn1030 · ‎03-17-2008

I will! I'm going to create a new conversation, so that I don't trump on Danny's anymore!!!

dtran · ‎03-13-2008

Hello Jeff,

This has been an going issue for quite sometime and I'll be a happy man if I can get to the bottom of it. I am going to try see if I can get an outage window sooner, I would like to run the test scenario you suggested and gather logs.

We have the environment in production for about 4 years now. We did run into some issues at the beginning but in all and all the environment has been very stable for us. I'll be a very happy man if I can get DR working. I am not a contact center guy but I have learned a lot working with it and I did work with an integrator to put the environment together. And when we install environment we did install it with side A and side B on the same network and both sides were located in the same building but the mistake was we never did any DR test until we moved side B over to our DR location in Tx so I don't know if DR was working before the move.

Thank you very much Jeff !!! I am glad that we have you on the forum !! I am gonna try to get the logs posted up ASAP. But in the meantime let me know if you come up with anything !!!

I won't be attending VoiceCon but looking forward to see you at Networkers if you're gonna be there.

Danny

dtran · ‎03-24-2008

Hello Jeff,

I am scheduling an outage to bring down our call center this week. I am going to try the test scenario you suggested and gather logs. Hope you have time to go over the logs for me.

Thank you very much !!

Danny

dtran · ‎03-31-2008

Hello Jeff,

I was able to schedule an outage window last week and went through the test scenario you suggested and I wanna give you a quick update. I did just as you suggested shuting one process at a time but this time the test was successful, all processes failed over to PG1B and the call center was working normally on PG1b. The only thing that I changed since the last unsuccessful test that I did a few months back was (please take a look at the attachment). I am not sure if the change I made has made any difference. I am planning on scheduling another test but this time I will turn off PG1a completely instead of shuting down one process at a time and will see how PG1b react.

Please take a look at the attachment and let me know what you think !!!!

thank you very much Jeff !!! I appreciate all your help !!!

jeff.marshall · ‎04-02-2008

Excellent Danny!

At this point, you can feel comfortable that the individual components are configured correctly. The changes that you made do make sense but shouldn't have seriously impacted failover. Since the B side PG is LAN connected to Call Router B, I would have most certainly set that to 'local' in the lower portion as well as to prefer Call Router B at initial installation. Great job!

Let us know how the server failover works out for you. Now we're working a bit higher up in the stack and proving success along the way.

/Jeff

adignan · ‎10-24-2008

Rbua,

Can you elaborate on the "dummy PG" and how I can use it to get around the "majority" issue? If I have a fully mirrored Side A and a Side B solution. I want B to be able to stay active in the event the entire A side data center went down. Since it will only have "half", it won't go active.

- Andy