cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2135
Views
0
Helpful
4
Replies

ASA HA failover pair best design.

Mark_Frayhard
Level 1
Level 1

Hello.

I am using a pair of switches (SW1, SW2) for failover link between ASA nodes and for several data interfaces (C,D).

Also I am using another pairs of switches (SW3,SW4 and SW5,SW6) for another data links on ASA nodes (inside,B).

 

When I rebooted SW1 3rd Mar I expected that the failover will happen and ASA_5525_2 (STB) become Active. But something went wrong. It looked like split brain happened and both ASA nodes became active. I can't understand why nodes didn't hear hello packets through inside or B interfaces. (Maybe I interpreted the logs wrong way). So the network didn't work correctly until SW1 loaded complitely.

 

A little more details: ASA is monitoring all interfaces (inside, B,C,D). ASA has OSPF relationships with SW3,SW4 and SW5,SW6. My configuration and logs:

 

failover
failover lan unit primary
failover lan interface failover GigabitEthernet0/7
failover key *****
failover link failover GigabitEthernet0/7
failover interface ip failover 1.1.1.1 255.255.255.0 standby 1.1.1.2

 

"show failover history" on ASA_5525_1:
11:10:35 MSK Mar 3 2019
Active Failed Interface check

11:10:36 MSK Mar 3 2019
Failed Just                            Active                                        HELLO not heard from mate

11:10:36 MSK Mar 3 2019
Just Active                           Active Drain                               HELLO not heard from mate

11:10:36 MSK Mar 3 2019
Active Drain                         Active Applying Config                HELLO not heard from mate

11:10:36 MSK Mar 3 2019
Active Applying Config        Active Config Applied                   HELLO not heard from mate

11:10:36 MSK Mar 3 2019
Active Config                      Applied Active                              HELLO not heard from mate

 

"show failover history" ASA_5525_2:

11:10:36 MSK Mar 3 2019
Standby Ready                  Just Active                                     Interface check

11:10:36 MSK Mar 3 2019
Just Active                        Active Drain                                    Interface check

11:10:36 MSK Mar 3 2019
Active Drain                       Active Applying Config                   Interface check

11:10:36 MSK Mar 3 2019
Active Applying Config       Active Config Applied                     Interface check

11:10:36 MSK Mar 3 2019
Active Config Applied         Active                                            Interface check

11:16:51 MSK Mar 3 2019
Active                                 Cold Standby                                Failover state check

11:16:53 MSK Mar 3 2019
Cold Standby                      Sync Config                                 Failover state check

11:17:01 MSK Mar 3 2019
Sync Config                        Sync File System                         Failover state check

11:17:01 MSK Mar 3 2019
Sync File System                 Bulk Sync                                   Failover state check

11:17:14 MSK Mar 3 2019
Bulk Sync                           Standby Ready                             Failover state check

 

<185>Mar 03 2019 11:10:30 ASA_5525 : %ASA-1-105005: (Secondary) Lost Failover communications with mate on interface inside
<185>Mar 03 2019 11:10:30 ASA_5525 : %ASA-1-105009: (Secondary) Testing on interface inside Passed

<185>Mar 03 2019 11:10:36 ASA_5525 : %ASA-1-103001: (Primary) No response from other firewall (reason code = 4).
<185>Mar 03 2019 11:10:35 ASA_5525 : %ASA-1-104002: (Primary) Switching to STANDBY - Interface check
<185>Mar 03 2019 11:10:36 ASA_5525 : %ASA-1-104001: (Primary) Switching to ACTIVE - HELLO not heard from mate.

 

Would you mind giving your opinion about my network design and reasons why this happened. I can use separated pair of switches for failover link but I haven't understood yet why whether I have to do it. I've read cisco guide https://www.cisco.com/c/en/us/td/docs/security/asa/asa95/configuration/general/asa-95-general-config/ha-failover.pdf Scenario 3—Recommended and it looks like my network topology.

 

I need you help, please.

4 Replies 4

can you show what config you have on both switches for the failover. seems to me some inconsistent configuration at switches side where the failover config are setup.

please do not forget to rate.

Ilkin
Cisco Employee
Cisco Employee
Mark, if you confirm that all interfaces including inside and B interfaces were monitored (in the output of show failover), there was no connectivity issue and it is confirmed that connectivity over these interfaces were not affected by SW1 directly or indirectly, here is what might have happened:

Failover interface status change itself does not trigger failover. It triggers connectivity check over monitored data interfaces in the first place.
Change in link status of any of interfaces C and D in this case does trigger failover, since these interfaces are monitored and failover interface policy is default - 1.
Link status check, when run along with recheability check, is determined faster than testing recheability on higher layers, and in this case it does trigger failover. As a result ASA_5525_1 becomes failed, ASA_5525_1 becomes active for a moment.

ASA_5525_1
11:10:35 MSK Mar 3 2019
Active Failed Interface check

ASA_5525_2
11:10:36 MSK Mar 3 2019
Active Config Applied Active Interface check

After switchover the state transitions could have been as follows:

on ASA_5525_1 : my=Failed,peer=Standby Ready -> "(Primary) No response from other firewall (reason code = 4)" -> my state Failed, peer state Failed -> "(Primary) Switching to ACTIVE - HELLO not heard from mate" -> my=Active,peer=Failed.

on ASA_5525_2 : my=Active,peer=Active -> "(Secondary) No response from other firewall (reason code = 4)" -> my=Active,peer=Failed.

After first switcover peer role information on both units peer state is not updated.
One possible explanation is that each firewall could not reach its peer over the remaining data interfaces, and ended up in split brain configuration. Reason code 4 means there is no recheability over failover and data interfaces and failover link is down.

I would suggest to make sure SW1 reload does not impact connections over inside and then B interfaces and re-test the same steps during maintenance window (or in a lab).
If the issue is still reproducable, then you can open a TAC case with all details for further analysis.

I kind of disagree with you. by default interface are in monitor mode when you configure the failover configuration on the box. unless the interfaces are in sub-interface than yes. need to give command on the cli to monitor these interface/s

please do not forget to rate.

Yes, but the point is to confirm that when SW1 is reloaded, connectivity between ASAs over interfaces C and D is in place.
If units cannot contact all due to issues in the transmission path, split brain will occur.
If units cannot or do not contact when there are no issues in the transmission path and split brain still occurs, then the case needs further analysis.
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: