cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2199
Views
5
Helpful
15
Replies

ASA5508 Active/Standby Automatic failback?

LindseyJGreen
Level 1
Level 1

Hi, 

I have a pair of Active/Standby ASA 5508's running 9.16.2. I needed to reboot the Primary Active firewall so I performed a stateful failover which worked as expected. 

I then rebooted the primary firewall and all traffic continued to work through the secondary unit.

The problem came when the active started to come back up, the secondary switched back to standby, seemingly before the Active was ready and we lost all connectivity briefly. 

I've never known the firewalls to automatically failback, let alone do it before the firewalls are ready.

Below is the Failover History from the ASA's:

Primary:

From State To State Reason
==========================================================================
10:10:09 BST Apr 4 2023
Not Detected Negotiation No Error

10:10:54 BST Apr 4 2023
Negotiation Just Active No Active unit found

10:10:54 BST Apr 4 2023
Just Active Active Drain No Active unit found

10:10:54 BST Apr 4 2023
Active Drain Active Applying Config No Active unit found

10:10:54 BST Apr 4 2023
Active Applying Config Active Config Applied No Active unit found

10:10:54 BST Apr 4 2023
Active Config Applied Active No Active unit found

==========================================================================

Secondary:

10:02:34 BST Apr 4 2023
Standby Ready Just Active Set by the config command

10:02:34 BST Apr 4 2023
Just Active Active Drain Set by the config command

10:02:34 BST Apr 4 2023
Active Drain Active Applying Config Set by the config command

10:02:34 BST Apr 4 2023
Active Applying Config Active Config Applied Set by the config command

10:02:34 BST Apr 4 2023
Active Config Applied Active Set by the config command

10:11:16 BST Apr 4 2023
Active Cold Standby Failover state check


10:11:17 BST Apr 4 2023
Cold Standby Sync Config Failover state check


10:12:13 BST Apr 4 2023
Sync Config Sync File System Failover state check


10:12:13 BST Apr 4 2023
Sync File System Bulk Sync Failover state check


10:12:26 BST Apr 4 2023
Bulk Sync Standby Ready Failover state check


==========================================================================

 

Below is our config:

Primary:

failover
failover lan unit primary
failover lan interface Failover GigabitEthernet1/8
failover link Failover GigabitEthernet1/8
failover interface ip Failover 172.16.254.1 255.255.255.252 standby 172.16.254.2
no failover wait-disable
no monitor-interface Staff-Wifi
no monitor-interface service-module

Secondary:

failover
failover lan unit secondary
failover lan interface Failover GigabitEthernet1/8
failover link Failover GigabitEthernet1/8
failover interface ip Failover 172.16.254.1 255.255.255.252 standby 172.16.254.2
no failover wait-disable
no monitor-interface Staff-Wifi
no monitor-interface service-module

Any Ideas what happened?

 

Thanks

1 Accepted Solution

Accepted Solutions

Found that this is a bug regarding certain NAT rules. The bug causes Split Brain - Bug CSCwb32841

I used Cisco's CLI Analyzer to find the bug.

View solution in original post

15 Replies 15

The newly restarted unit should never claim the active role by itself as there is no preemption on the ASA for failover. When looking at the messages, I could assume that something is not correct with your HA-link GigabitEthernet1/8. Are the two units directly connected or are they connected through a switched infrastructure? If there is a switch in between, are the ports configured for portfast?

Hi,

The failover is configured on port 0/8 on both firewalls and is directly connected. There doesn't appear to be any issues with that link, and the firewalls showed the config and ports looking as they should when i check with a "show fail" before performing the stateful failover. The initial stateful failover also worked without any issues (didn't lose a single ping to the internet) and it was at least 10 minutes before the primary then came back up and the issue occurred.

I've never seen this occur before and have performed numerous zero-downtime firewall upgrades both on this pair and many others and never had this outcome.

do this friend in both FW
interface GigabitEthernet1/8
no shut 

why would i need to no shut an interface that's already up?

 

Interface GigabitEthernet1/8 "Failover", is up, line protocol is up
Hardware is Accelerator rev01, BW 1000 Mbps, DLY 10 usec
Auto-Duplex(Full-duplex), Auto-Speed(1000 Mbps)
Input flow control is unsupported, output flow control is off
Description: LAN/STATE Failover Interface
MAC address cc16.7e98.dd1b, MTU 1500
IP address 172.16.254.1, subnet mask 255.255.255.252
18753 packets input, 2349476 bytes, 0 no buffer
Received 56 broadcasts, 0 runts, 0 giants
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 pause input, 0 resume input
0 L2 decode drops
1465832 packets output, 1317433220 bytes, 0 underruns
0 pause output, 0 resume output
0 output errors, 0 collisions, 0 interface resets
0 late collisions, 0 deferred
0 input reset drops, 1 output reset drops
input queue (blocks free curr/low): hardware (1984/1918)
output queue (blocks free curr/low): hardware (2047/2008)
Traffic Statistics for "Failover":
18753 packets input, 2006214 bytes
1465832 packets output, 1291046460 bytes
0 packets dropped
1 minute input rate 1 pkts/sec, 129 bytes/sec
1 minute output rate 172 pkts/sec, 153593 bytes/sec
1 minute drop rate, 0 pkts/sec
5 minute input rate 1 pkts/sec, 184 bytes/sec
5 minute output rate 158 pkts/sec, 141132 bytes/sec
5 minute drop rate, 0 pkts/sec

Why would i need to no shut interfaces which are already up?

It defualt down not up' that why each fw not detect mate 

So just no shut it.

The ports are not in default and were up which is how the stateful failover occurred in the first place. the ports are live and were live when I initiated the failover otherwise I'd have not seen the standby unit as "standby ready".

Primary:

Interface GigabitEthernet1/8 "Failover", is up, line protocol is up
Hardware is Accelerator rev01, BW 1000 Mbps, DLY 10 usec
Auto-Duplex(Full-duplex), Auto-Speed(1000 Mbps)
Input flow control is unsupported, output flow control is off
Description: LAN/STATE Failover Interface
MAC address cc16.7e98.dd1b, MTU 1500
IP address 172.16.254.1, subnet mask 255.255.255.252
29222 packets input, 3681760 bytes, 0 no buffer
Received 59 broadcasts, 0 runts, 0 giants
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 pause input, 0 resume input
0 L2 decode drops
2305301 packets output, 2075831472 bytes, 0 underruns
0 pause output, 0 resume output
0 output errors, 0 collisions, 0 interface resets
0 late collisions, 0 deferred
0 input reset drops, 1 output reset drops
input queue (blocks free curr/low): hardware (2011/1918)
output queue (blocks free curr/low): hardware (2047/2008)
Traffic Statistics for "Failover":
29222 packets input, 3149288 bytes
2305301 packets output, 2034334216 bytes
0 packets dropped
1 minute input rate 2 pkts/sec, 437 bytes/sec
1 minute output rate 138 pkts/sec, 122693 bytes/sec
1 minute drop rate, 0 pkts/sec
5 minute input rate 1 pkts/sec, 182 bytes/sec
5 minute output rate 136 pkts/sec, 121221 bytes/sec
5 minute drop rate, 0 pkts/sec

Secondary:

Interface GigabitEthernet1/8 "Failover", is up, line protocol is up
Hardware is Accelerator rev01, BW 1000 Mbps, DLY 10 usec
Auto-Duplex(Full-duplex), Auto-Speed(1000 Mbps)
Input flow control is unsupported, output flow control is off
Description: LAN/STATE Failover Interface
MAC address 0081.c450.607d, MTU 1500
IP address 172.16.254.2, subnet mask 255.255.255.252
2681082808 packets input, 2377976899072 bytes, 0 no buffer
Received 420 broadcasts, 0 runts, 0 giants
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 pause input, 0 resume input
0 L2 decode drops
61111493 packets output, 7301123504 bytes, 0 underruns
0 pause output, 0 resume output
0 output errors, 0 collisions, 0 interface resets
0 late collisions, 0 deferred
0 input reset drops, 434 output reset drops
input queue (blocks free curr/low): hardware (2013/1889)
output queue (blocks free curr/low): hardware (2047/2027)
Traffic Statistics for "Failover":
2320992 packets input, 2004854428 bytes
29073 packets output, 3121166 bytes
0 packets dropped
1 minute input rate 133 pkts/sec, 115840 bytes/sec
1 minute output rate 1 pkts/sec, 120 bytes/sec
1 minute drop rate, 0 pkts/sec
5 minute input rate 135 pkts/sec, 117708 bytes/sec
5 minute output rate 1 pkts/sec, 182 bytes/sec
5 minute drop rate, 0 pkts/sec

 

please can you do this 
capture HA-link interface Failover match ip host 172.16.254.1 host 172.16.254.2
wait 5 min 
capture HA-link stop 
show capture HA-link 

check if both FW exchange hello message in time you do failover test. 
thanks 

Do i do this capture on the active firewall as i failover?

In any one'

Both FW send receive hello message.

I performed a stateful failover to secondary and reboot of primary standby without capturing to see if it was just a one off and lost service.

I then repeated this and captured the attached failover information.

Primary to secondary capture - just a standard stateful failover from primary active to secondary standby 

reboot of primary capture 1 - reboot of primary standby firewall - no loss of service

reboot of primary capture 2 - reboot of primary standby firewall - loss of service and primary became active itself.

There seems to be no issue with the failover link from these captures.

I'm at a loss as to why 2 out of 3 reboots of the primary standby causes loss of service. could it be a bug?

The ports are not default. The ports are live and being actively used to send the config to the standby. the failover link has to be working in order to perform a stateful failover, which it did. Entering no shut has no effect on interfaces already live.

Found that this is a bug regarding certain NAT rules. The bug causes Split Brain - Bug CSCwb32841

I used Cisco's CLI Analyzer to find the bug.

IMO, having (any, any) in a NAT statement is always a misconfiguration.

Review Cisco Networking for a $25 gift card