cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
471
Views
0
Helpful
0
Replies

IM&P HA Manual Failover - Still failed over

sean Riley
Level 4
Level 4

IM&P v 11.5(1) SU9

 

Our IM&P HA config has the Publisher at site A and the Sub at site B.  We run active/passive, so all users are running from site A and only run from site B if there is a failure at site A or we force a manual failover.

 

We had a scheduled network outage at site B, so I did a manual failover of the site B node.  This was to avoid both nodes from thinking the other was down and trying to failover.  However, this still happened.  During the outage, both nodes attempted failover and ended up in a failed state requiring manual recovery.

 

My question: is this expected?  If I manually failover the node at site B, should the node at site A not attempt to failover after losing heartbeats with site B.   I have done this several times in the past and thought it avoided the scenario I ended up in.  Maybe I need to take another step and stop the SRM service?

 

I have set my heartbeat timers as below to try to avoid this failed node scenario when we have unexpected network outages.  Maybe I adjusted them in a way that made it act the way it did.


Critical Service Down Delay 90

Initialization Keep Alive (Heartbeat) Timeout 240

Keep Alive (Heartbeat) Timeout 240

Keep Alive (Heartbeat) Interval 30

0 Replies 0