Solved: HA ASA-5525 pair failed to send gratuitous ARPs during failover. Why?

jmaxwellUSAF · ‎02-23-2023

Hello.

I performed a task as instructed by my senior, to reboot the primary ASA-5525 (9.14(3)), then when it returned online, to reboot the secondary. At the beginning, I believe I did execute "failover active", but I am not certain because I remember that I concluded it was irrelevant. I did reboot the primary ASA. I verified that this primary came back online by witnessing on this device a normal reaction to pressing "enter" a few times.

Then on the secondary, I entered "failover active", waited for about 15 seconds, then rebooted the secondary-- The secondary went offline, then every connection lost connectivity that traversed the ASA-5525. Clearly the failover technology somehow failed, because it is confirmed that the connected devices did not receive gratuitous ARPs.

The correct question now is-- In a HA ASA-5525 cluster, when executing "failover active", why would the active secondary device not send gratuitous ARPs to the downstream devices?

Thank you.

Rob Ingram · ‎02-26-2023

@jmaxwellUSAF "Generally, when a failover occurs, the new active unit takes over the active IP addresses and MAC addresses. Because network devices see no change in the MAC to IP address pairing, no ARP entries change or time out anywhere on the network."....that is a quote from this guide - https://www.cisco.com/c/en/us/td/docs/security/asa/asa917/configuration/general/asa-917-general-config/ha-failover.html

View solution in original post

jmaxwellUSAF · ‎02-26-2023

As a helpful and cautionary note, the below dynamic caused a "big nightmare" event that resulted in significant financial impact to an enterprise. It would be best for professionals to appreciate it by remember to configure VIRTUAL MAC ADDRESSES...

Active/Standby IP Addresses and MAC Addresses
For Active/StandbyFailover,see the following for IPaddress and MAC address usage during a failover event:
1. The active unit always uses the primary unit's IP addresses and MAC addresses.
2. When the active unit fails over, the standby unit assumes the IP addresses and MAC addresses of the failed unit and begins passing traffic.
3. When the failed unit comes back online, it is now in a standby state and takes over the standby IPaddresses
and MAC addresses.

MAC Addresses and IP Addresses in Failover
However, if the secondary unit boots without detecting the primary unit, then the secondary unit becomes the active unit and uses its own MAC addresses, because it does not know the primary unit MAC addresses. When the primary unit becomes available, the secondary (active) unit changes the MAC addresses to those of the primary unit, which can cause an interruption in your network traffic. Similarly, if you swap out the primary unit with new hardware, a new MAC address is used.

Virtual MAC addresses guard against this disruption, because the active MAC addresses are known to the secondary unit at startup, and remain the same in the case of new primary unit hardware. If you do not configure virtual MAC addresses, you might need to clear the ARP tables on connected routers to restore traffic flow. The ASA does not send gratuitous ARPs for static NAT addresses when the MAC address changes, so connected routers do not learn of the MAC address change for these addresses.

CLI Book 1: Cisco ASA Series General Operations CLI Configuration Guide, 9.17 - Failover for High Availability [Cisco Secure Firewall ASA] - Cisco

View solution in original post

jmaxwellUSAF · ‎02-23-2023

Very relevant info here...

Solved: What protocol does HA in cisco ASA uses??? - Cisco Community

Shared with the Failover Link

Sharing a failover link is the best way to conserve interfaces. However, you must consider a dedicated interface for the state link and failover link, if you have a large configuration and a high traffic network.

Dedicated Interface

You can use a dedicated data interface (physical, redundant, or EtherChannel) for the state link. For an EtherChannel used as the state link, to prevent out-of-order packets, only one interface in the EtherChannel is used. If that interface fails, then the next interface in the EtherChannel is used.

Connect a dedicated state link in one of the following two ways:

Using a switch, with no other device on the same network segment (broadcast domain or VLAN) as the failover interfaces of the ASAdevice.
Using an Ethernet cable to connect the appliances directly, without the need for an external switch.

If you do not use a switch between the units, if the interface fails, the link is brought down on both peers. This condition may hamper troubleshooting efforts because you cannot easily determine which unit has the failed interface and caused the link to come down.

The ASA supports Auto-MDI/MDIX on its copper Ethernet ports, so you can either use a crossover cable or a straight-through cable. If you use a straight-through cable, the interface automatically detects the cable and swaps one of the transmit/receive pairs to MDIX.

For optimum performance when using long distance failover, the latency for the state link should be less than 10 milliseconds and no more than 250 milliseconds. If latency is more than 10 milliseconds, some performance degradation occurs due to retransmission of failover messages.

---

Avoiding Interrupted Failover and Data Links

We recommend that failover links and data interfaces travel through different paths to decrease the chance that all interfaces fail at the same time. If the failover link is down, the ASA can use the data interfaces to determine if a failover is required. Subsequently, the failover operation is suspended until the health of the failover link is restored.

See the following connection scenarios to design a resilient failover network.

---

Scenario 1—Not Recommended

If a single switch or a set of switches are used to connect both failover and data interfaces between two ASAs, then when a switch or inter-switch-link is down, both ASAs become active. Therefore, the following two connection methods shown in the following figures are NOT recommended.

Figure 1. Connecting with a Single Switch—Not Recommended

Figure 2. Connecting with a Double-Switch—Not Recommended

---

Scenario 2—Recommended

We recommend that failover links NOT use the same switch as the data interfaces. Instead, use a different switch or use a direct cable to connect the failover link, as shown in the following figures.

Figure 3. Connecting with a Different Switch

Figure 4. Connecting with a Cable

MHM Cisco World · ‎02-26-2023

sorry it late reply but some times I need time to make test before reply
anyway
I see you mention NSK in one side of ASA HA
you can use
etheranalyzer local interface inband limit-capture-frames 30 <<- do this in NSK when you do failover active in standby ASA to capture if the ASA send G-ARP or not.
thanks
MHM

jmaxwellUSAF · ‎02-26-2023

Thank you MHM.

This is very helpful, and clearly you put much work into this response.

jmaxwellUSAF · ‎02-26-2023

What is "NSK"?

Rob Ingram · ‎02-26-2023

@jmaxwellUSAF "Generally, when a failover occurs, the new active unit takes over the active IP addresses and MAC addresses. Because network devices see no change in the MAC to IP address pairing, no ARP entries change or time out anywhere on the network."....that is a quote from this guide - https://www.cisco.com/c/en/us/td/docs/security/asa/asa917/configuration/general/asa-917-general-config/ha-failover.html

jmaxwellUSAF · ‎02-26-2023

You have located the essential literature for this issue. Thank you Rob!

"Generally, when a failover occurs, the new active unit takes over the active IP addresses and MAC addresses. Because network devices see no change in the MAC to IP address pairing, no ARP entries change or time out anywhere on the network."

This seems to imply that the connected devices' ARP and mac-address tables would hold identical entries for two interfaces, so all traffic destined to the HA pair would always exit 2 interfaces on any redundantly connected device. Is that correct?

Rob Ingram · ‎02-26-2023

@jmaxwellUSAF from the book - Cisco ASA all in one

jmaxwellUSAF · ‎02-26-2023

is there a link to this text?

And because I own this physical text, may you provide the page #?

thank you.

Rob Ingram · ‎02-26-2023

@jmaxwellUSAF the top of page 662 - Cisco ASA All-in-One Next Generation Firewall, Third Edition.

jmaxwellUSAF · ‎02-26-2023

Hi Rob. I could not find this on google. May you tell me, or send me a link to what means "cold standby" and "active drain" in the below data? Thank you!

FW/sec/stby# sh failo hist
==========================================================================
From State To State Reason
==========================================================================
12:48:46 EST Feb 22 2023
Not Detected Negotiation No Error

12:48:50 EST Feb 22 2023
Negotiation Cold Standby Detected an Active mate

12:48:52 EST Feb 22 2023
Cold Standby Sync Config Detected an Active mate

12:49:03 EST Feb 22 2023
Sync Config Sync File System Detected an Active mate

12:49:03 EST Feb 22 2023
Sync File System Bulk Sync Detected an Active mate

12:49:16 EST Feb 22 2023
Bulk Sync Standby Ready Detected an Active mate

13:41:35 EST Feb 22 2023
Standby Ready Just Active Other unit wants me Active

13:41:35 EST Feb 22 2023
Just Active Active Drain Other unit wants me Active

13:41:35 EST Feb 22 2023
Active Drain Active Applying Config Other unit wants me Active

13:41:35 EST Feb 22 2023
Active Applying Config Active Config Applied Other unit wants me Active

13:41:35 EST Feb 22 2023
Active Config Applied Active Other unit wants me Active

13:42:15 EST Feb 22 2023
Active Standby Ready Other unit wants me Standby

Rob Ingram · ‎02-26-2023

@jmaxwellUSAF table 3 in this guide https://www.cisco.com/c/en/us/td/docs/security/asa/asa-cli-reference/S/asa-command-ref-S/show-f-to-show-ipu-commands.html

Cold Standby

The unit waits for the peer to reach the Active state. When the peer unit reaches the Active state, this unit progresses to the Standby Config state. This is a transient state.

Active Drain

Queues messages from the peer are discarded. This is a transient state.

MHM Cisco World · ‎02-26-2023

That not correct as I know' in active/standby the unti that be elect as new active always send g-arp

Why ?

Because it make SW know that the port to new active is change.

That why I mention NSK and how you must detect G-ARP

Some times this G-ARP missed and SW use previous port which lead to old active pair and hence packet drop.

jmaxwellUSAF · ‎02-26-2023

I now realize I did not fundamentally understand how "protocol 105" technology works. I thought it was the same as HSRP technology, it is NOT.

"Generally, when a failover occurs, the new active unit takes over the active IP addresses and MAC addresses. Because network devices see no change in the MAC to IP address pairing, no ARP entries change or time out anywhere on the network." https://www.cisco.com/c/en/us/td/docs/security/asa/asa917/configuration/general/asa-917-general-config/ha-failover.html

jmaxwellUSAF · ‎02-26-2023

As a helpful and cautionary note, the below dynamic caused a "big nightmare" event that resulted in significant financial impact to an enterprise. It would be best for professionals to appreciate it by remember to configure VIRTUAL MAC ADDRESSES...

Active/Standby IP Addresses and MAC Addresses
For Active/StandbyFailover,see the following for IPaddress and MAC address usage during a failover event:
1. The active unit always uses the primary unit's IP addresses and MAC addresses.
2. When the active unit fails over, the standby unit assumes the IP addresses and MAC addresses of the failed unit and begins passing traffic.
3. When the failed unit comes back online, it is now in a standby state and takes over the standby IPaddresses
and MAC addresses.

MAC Addresses and IP Addresses in Failover
However, if the secondary unit boots without detecting the primary unit, then the secondary unit becomes the active unit and uses its own MAC addresses, because it does not know the primary unit MAC addresses. When the primary unit becomes available, the secondary (active) unit changes the MAC addresses to those of the primary unit, which can cause an interruption in your network traffic. Similarly, if you swap out the primary unit with new hardware, a new MAC address is used.

Virtual MAC addresses guard against this disruption, because the active MAC addresses are known to the secondary unit at startup, and remain the same in the case of new primary unit hardware. If you do not configure virtual MAC addresses, you might need to clear the ARP tables on connected routers to restore traffic flow. The ASA does not send gratuitous ARPs for static NAT addresses when the MAC address changes, so connected routers do not learn of the MAC address change for these addresses.

CLI Book 1: Cisco ASA Series General Operations CLI Configuration Guide, 9.17 - Failover for High Availability [Cisco Secure Firewall ASA] - Cisco