04-04-2023 03:01 AM
Hi,
I have a pair of Active/Standby ASA 5508's running 9.16.2. I needed to reboot the Primary Active firewall so I performed a stateful failover which worked as expected.
I then rebooted the primary firewall and all traffic continued to work through the secondary unit.
The problem came when the active started to come back up, the secondary switched back to standby, seemingly before the Active was ready and we lost all connectivity briefly.
I've never known the firewalls to automatically failback, let alone do it before the firewalls are ready.
Below is the Failover History from the ASA's:
Primary:
From State To State Reason
==========================================================================
10:10:09 BST Apr 4 2023
Not Detected Negotiation No Error
10:10:54 BST Apr 4 2023
Negotiation Just Active No Active unit found
10:10:54 BST Apr 4 2023
Just Active Active Drain No Active unit found
10:10:54 BST Apr 4 2023
Active Drain Active Applying Config No Active unit found
10:10:54 BST Apr 4 2023
Active Applying Config Active Config Applied No Active unit found
10:10:54 BST Apr 4 2023
Active Config Applied Active No Active unit found
==========================================================================
Secondary:
10:02:34 BST Apr 4 2023
Standby Ready Just Active Set by the config command
10:02:34 BST Apr 4 2023
Just Active Active Drain Set by the config command
10:02:34 BST Apr 4 2023
Active Drain Active Applying Config Set by the config command
10:02:34 BST Apr 4 2023
Active Applying Config Active Config Applied Set by the config command
10:02:34 BST Apr 4 2023
Active Config Applied Active Set by the config command
10:11:16 BST Apr 4 2023
Active Cold Standby Failover state check
10:11:17 BST Apr 4 2023
Cold Standby Sync Config Failover state check
10:12:13 BST Apr 4 2023
Sync Config Sync File System Failover state check
10:12:13 BST Apr 4 2023
Sync File System Bulk Sync Failover state check
10:12:26 BST Apr 4 2023
Bulk Sync Standby Ready Failover state check
==========================================================================
Below is our config:
Primary:
failover
failover lan unit primary
failover lan interface Failover GigabitEthernet1/8
failover link Failover GigabitEthernet1/8
failover interface ip Failover 172.16.254.1 255.255.255.252 standby 172.16.254.2
no failover wait-disable
no monitor-interface Staff-Wifi
no monitor-interface service-module
Secondary:
failover
failover lan unit secondary
failover lan interface Failover GigabitEthernet1/8
failover link Failover GigabitEthernet1/8
failover interface ip Failover 172.16.254.1 255.255.255.252 standby 172.16.254.2
no failover wait-disable
no monitor-interface Staff-Wifi
no monitor-interface service-module
Any Ideas what happened?
Thanks
Solved! Go to Solution.
04-06-2023 12:06 AM
Found that this is a bug regarding certain NAT rules. The bug causes Split Brain - Bug CSCwb32841
I used Cisco's CLI Analyzer to find the bug.
04-04-2023 03:22 AM
The newly restarted unit should never claim the active role by itself as there is no preemption on the ASA for failover. When looking at the messages, I could assume that something is not correct with your HA-link GigabitEthernet1/8. Are the two units directly connected or are they connected through a switched infrastructure? If there is a switch in between, are the ports configured for portfast?
04-04-2023 04:53 AM
Hi,
The failover is configured on port 0/8 on both firewalls and is directly connected. There doesn't appear to be any issues with that link, and the firewalls showed the config and ports looking as they should when i check with a "show fail" before performing the stateful failover. The initial stateful failover also worked without any issues (didn't lose a single ping to the internet) and it was at least 10 minutes before the primary then came back up and the issue occurred.
I've never seen this occur before and have performed numerous zero-downtime firewall upgrades both on this pair and many others and never had this outcome.
04-04-2023 05:03 AM
do this friend in both FW
interface GigabitEthernet1/8
no shut
04-04-2023 05:07 AM
why would i need to no shut an interface that's already up?
Interface GigabitEthernet1/8 "Failover", is up, line protocol is up
Hardware is Accelerator rev01, BW 1000 Mbps, DLY 10 usec
Auto-Duplex(Full-duplex), Auto-Speed(1000 Mbps)
Input flow control is unsupported, output flow control is off
Description: LAN/STATE Failover Interface
MAC address cc16.7e98.dd1b, MTU 1500
IP address 172.16.254.1, subnet mask 255.255.255.252
18753 packets input, 2349476 bytes, 0 no buffer
Received 56 broadcasts, 0 runts, 0 giants
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 pause input, 0 resume input
0 L2 decode drops
1465832 packets output, 1317433220 bytes, 0 underruns
0 pause output, 0 resume output
0 output errors, 0 collisions, 0 interface resets
0 late collisions, 0 deferred
0 input reset drops, 1 output reset drops
input queue (blocks free curr/low): hardware (1984/1918)
output queue (blocks free curr/low): hardware (2047/2008)
Traffic Statistics for "Failover":
18753 packets input, 2006214 bytes
1465832 packets output, 1291046460 bytes
0 packets dropped
1 minute input rate 1 pkts/sec, 129 bytes/sec
1 minute output rate 172 pkts/sec, 153593 bytes/sec
1 minute drop rate, 0 pkts/sec
5 minute input rate 1 pkts/sec, 184 bytes/sec
5 minute output rate 158 pkts/sec, 141132 bytes/sec
5 minute drop rate, 0 pkts/sec
04-04-2023 06:28 AM
Why would i need to no shut interfaces which are already up?
04-04-2023 06:46 AM - edited 04-04-2023 06:46 AM
It defualt down not up' that why each fw not detect mate
So just no shut it.
04-04-2023 06:52 AM
The ports are not in default and were up which is how the stateful failover occurred in the first place. the ports are live and were live when I initiated the failover otherwise I'd have not seen the standby unit as "standby ready".
Primary:
Interface GigabitEthernet1/8 "Failover", is up, line protocol is up
Hardware is Accelerator rev01, BW 1000 Mbps, DLY 10 usec
Auto-Duplex(Full-duplex), Auto-Speed(1000 Mbps)
Input flow control is unsupported, output flow control is off
Description: LAN/STATE Failover Interface
MAC address cc16.7e98.dd1b, MTU 1500
IP address 172.16.254.1, subnet mask 255.255.255.252
29222 packets input, 3681760 bytes, 0 no buffer
Received 59 broadcasts, 0 runts, 0 giants
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 pause input, 0 resume input
0 L2 decode drops
2305301 packets output, 2075831472 bytes, 0 underruns
0 pause output, 0 resume output
0 output errors, 0 collisions, 0 interface resets
0 late collisions, 0 deferred
0 input reset drops, 1 output reset drops
input queue (blocks free curr/low): hardware (2011/1918)
output queue (blocks free curr/low): hardware (2047/2008)
Traffic Statistics for "Failover":
29222 packets input, 3149288 bytes
2305301 packets output, 2034334216 bytes
0 packets dropped
1 minute input rate 2 pkts/sec, 437 bytes/sec
1 minute output rate 138 pkts/sec, 122693 bytes/sec
1 minute drop rate, 0 pkts/sec
5 minute input rate 1 pkts/sec, 182 bytes/sec
5 minute output rate 136 pkts/sec, 121221 bytes/sec
5 minute drop rate, 0 pkts/sec
Secondary:
Interface GigabitEthernet1/8 "Failover", is up, line protocol is up
Hardware is Accelerator rev01, BW 1000 Mbps, DLY 10 usec
Auto-Duplex(Full-duplex), Auto-Speed(1000 Mbps)
Input flow control is unsupported, output flow control is off
Description: LAN/STATE Failover Interface
MAC address 0081.c450.607d, MTU 1500
IP address 172.16.254.2, subnet mask 255.255.255.252
2681082808 packets input, 2377976899072 bytes, 0 no buffer
Received 420 broadcasts, 0 runts, 0 giants
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 pause input, 0 resume input
0 L2 decode drops
61111493 packets output, 7301123504 bytes, 0 underruns
0 pause output, 0 resume output
0 output errors, 0 collisions, 0 interface resets
0 late collisions, 0 deferred
0 input reset drops, 434 output reset drops
input queue (blocks free curr/low): hardware (2013/1889)
output queue (blocks free curr/low): hardware (2047/2027)
Traffic Statistics for "Failover":
2320992 packets input, 2004854428 bytes
29073 packets output, 3121166 bytes
0 packets dropped
1 minute input rate 133 pkts/sec, 115840 bytes/sec
1 minute output rate 1 pkts/sec, 120 bytes/sec
1 minute drop rate, 0 pkts/sec
5 minute input rate 135 pkts/sec, 117708 bytes/sec
5 minute output rate 1 pkts/sec, 182 bytes/sec
5 minute drop rate, 0 pkts/sec
04-04-2023 04:53 PM
please can you do this
capture HA-link interface Failover match ip host 172.16.254.1 host 172.16.254.2
wait 5 min
capture HA-link stop
show capture HA-link
check if both FW exchange hello message in time you do failover test.
thanks
04-05-2023 12:04 AM
Do i do this capture on the active firewall as i failover?
04-05-2023 03:16 AM
In any one'
Both FW send receive hello message.
04-05-2023 11:07 AM
I performed a stateful failover to secondary and reboot of primary standby without capturing to see if it was just a one off and lost service.
I then repeated this and captured the attached failover information.
Primary to secondary capture - just a standard stateful failover from primary active to secondary standby
reboot of primary capture 1 - reboot of primary standby firewall - no loss of service
reboot of primary capture 2 - reboot of primary standby firewall - loss of service and primary became active itself.
There seems to be no issue with the failover link from these captures.
I'm at a loss as to why 2 out of 3 reboots of the primary standby causes loss of service. could it be a bug?
04-04-2023 07:17 AM
The ports are not default. The ports are live and being actively used to send the config to the standby. the failover link has to be working in order to perform a stateful failover, which it did. Entering no shut has no effect on interfaces already live.
04-06-2023 12:06 AM
04-06-2023 12:14 AM
IMO, having (any, any) in a NAT statement is always a misconfiguration.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide