cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
9589
Views
0
Helpful
5
Replies

ASA failover: secondary ASA disabled failover on its own

grischast
Level 1
Level 1

Hi all

I have a failover pair of ASA 5520 (Software Version 8.2(4)4)

located in two different data centers.

Because of a network issue the layer 2 connection between both locations has been interrupted for a couple of seconds and the ASAs went into split-brain as one would expect them to do.

The thing is that after approx. 1 minute the secondary ASA switched off its failover configuration (i.e. "show run" gives "no failover") without anybody telling it to do so. Here is the "show failover history" of the device:

07:57:34 MESZ Aug 15 2011

Standby Ready              Just Active                HELLO not heard from mate

07:57:34 MESZ Aug 15 2011

Just Active                Active Drain               HELLO not heard from mate

07:57:34 MESZ Aug 15 2011

Active Drain               Active Applying Config     HELLO not heard from mate

07:57:34 MESZ Aug 15 2011

Active Applying Config     Active Config Applied      HELLO not heard from mate

07:57:34 MESZ Aug 15 2011

Active Config Applied      Active                     HELLO not heard from mate

07:58:03 MESZ Aug 15 2011

Active                     Cold Standby               Failover state check

07:58:18 MESZ Aug 15 2011

Cold Standby               Disabled                   HA state progression failed

At this point failover was switched off completely and the split-brain remained even after the layer-2-connection has been reestablished.

This is no good.:( I have searched for "HA state progression failed" without any useful result/explanation.

Why did the device switch off failover on its own and how can we assure that it won't do this again?

Best regards,

Grischa

5 Replies 5

varrao
Level 10
Level 10

Hi Grischa,

Can you confikrm if the failover link is  connected directly with eachother and the rest of the interfaces are connected through a switch??

You might be hitting this bug:

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtg55257

However, further research on the issue, suggests me that one possible work around would be to manually enable failover on the secondary device again.

Let me know if this helps.

Thanks,

Varun

Thanks,
Varun Rao

Hi Varun

The devices are installed at different locations. The failover link is a MPLS-L2transport across the providers backbone from one location to the other. I.e. failover interfaces are connected to a switch on either side, switches have a dot1q-trunk to the providers access routers which connect the two ports via MPLS-L2transport:

ASA1 - Switch1 - PE1-Router ----MPLS-L2transport---- PE2-Router - Switch2 - ASA2

Due to a network issue the MPLS-L2transport has been interrupted for a couple of seconds and afterwards we had a persisting split-brain situation.

Of course I have enabled failover manually again. But the plan is that the ASAs recover for themselves into normal operation as soon as the network is stable again.

Regards,

Grischa

I would suggest you open a TAC case for it, because this seems to be some unusual behavior. If you manually enable the failover again, does the secondary become standby and functions properly??

-Varun

Thanks,
Varun Rao

Yes, only thing I needed to do was issuing "failover" on the secondary. It detected its active mate and went properly into standby:

09:16:18 MESZ Aug 15 2011

Disabled                   Negotiation                Set by the config command

09:16:19 MESZ Aug 15 2011

Negotiation                Cold Standby               Detected an Active mate

09:16:21 MESZ Aug 15 2011

Cold Standby               Sync Config                Detected an Active mate

09:16:31 MESZ Aug 15 2011

Sync Config                Sync File System           Detected an Active mate

09:16:31 MESZ Aug 15 2011

Sync File System           Bulk Sync                  Detected an Active mate

09:16:31 MESZ Aug 15 2011

Bulk Sync                  Standby Ready              Detected an Active mate

I guess we will go the TAC way if we encounter this situation a second time. This time we will be warned and know where to look at.

Is there really no documentation available of the "HA state progression failed" message? What does it mean and how is it triggered usually?

Regards,

Grischa

Well if I go by the symptoms and the conditions of  your issue, it definitely seems to be the bug that I have provided you. I've tried doing some research on it, but could not find any documentation for it, since this is not a expected behavior. I guess opening a TAC would be the right step if it happens again. When the first instance of this issue was noticed, it was not possible to recreate the issue.

This is encountered, everytime the switch encounters an issue, if it happens again, TAC case would be best, so that we have some data to pull out, when the secondary is failed and identify whether it is the same problem.

Let me know if this answers your question.

Thanks,

Varun

Thanks,
Varun Rao
Review Cisco Networking for a $25 gift card