01-28-2022 09:33 AM
Hello,
I witnessed a strange behaviour in two of our eight Firepower Clusters.
Affected are one FTD 2110 HA Cluster and one FTD 4110 HA Cluster all on 6.6.5.
After a mayor power outage in our DC all Systems went down.
(VXRAIL, Switches, Router, Firewalls all offline for an hour)
The Staff on side switched the power back on and all devices came back online in no specific order.
Nothing has been brocken but some Client couldn't reach their Gateways.
It turns out that the one DC Cluster went in active / active and didn't negotiate their HA State.
The GW IP was active on both FTDs and the Clients lost the connection form time to time.
Switch Active Peer had no effect and the sync between both FTDs didn't finish after 30 min. so we rebooted the "standby" FTD and let the active up and running.
The Reboot changes nothing and we had to power shutdown on FTD (the not working "standby".)
I did some research and found no specific bug.
I will try the following Steps on a Maintenance Window next Week:
- HA Suspend on the "active" FTD
(boot the "standby" and looking for some logs or crash reports)
- Resume HA
- Reboot both FTDs
Any thought or tasks would be helpful.
regards
Alex
Solved! Go to Solution.
09-21-2022 01:12 PM
01-28-2022 09:47 AM
Since unexpected power outage, May be something might have crashed,
- Please confirm between the device Layer 2 is ok ?
- the one offline (remove all the connection, boot the device and check, is that booted ? as expected before you go to next step ?
01-30-2022 08:36 AM
Yes, both Clusters are directly connected withe an Ethernet cable.
01-28-2022 12:41 PM - edited 01-28-2022 01:11 PM
Hi Alex,
we faced similar issues on several customers deployments already after power outages and simultaneous boot of the primary and secondary node.
Please send the output of
show failover show failover history
from the primary and secondary appliances to validate my suspicion.
If you notice something like this on the secondary node:
> show failover Failover Off (pseudo-Standby) Failover unit Secondary Failover LAN Interface: failover-link Ethernet1/8 (up) Reconnect timeout 0:00:00 Unit Poll frequency 1 seconds, holdtime 15 seconds Interface Poll frequency 5 seconds, holdtime 25 seconds Interface Policy 1 Monitored Interfaces 3 of 1288 maximum MAC Address Move Notification Interval not set > show failover history ========================================================================== From State To State Reason ========================================================================== 16:03:29 UTC Jul 14 2021 Disabled Negotiation Set by the config command 16:03:31 UTC Jul 14 2021 Negotiation Cold Standby Detected an Active mate 16:03:32 UTC Jul 14 2021 Cold Standby App Sync Detected an Active mate 16:04:05 UTC Jul 14 2021 App Sync Disabled CD App Sync error is App Config Apply Failed 16:06:17 UTC Jul 14 2021 Disabled Negotiation Set by the config command 16:06:19 UTC Jul 14 2021 Negotiation Cold Standby Detected an Active mate 16:06:20 UTC Jul 14 2021 Cold Standby App Sync Detected an Active mate 16:06:54 UTC Jul 14 2021 App Sync Disabled CD App Sync error is App Config Apply Failed ==========================================================================
You should be able to let the secondary node resync with primary via command
config high-availability resume
Validate via
show failover show failover history
If you rebooted the secondary in Pseudo Standby state it might actually be in failover off state. In this case you will have to:
+ break the HA
+ de-register the affected device and re-register it again.
+ add it back to the HA pair.
+ if after this the device is still not able to sync then most likely we need to reimage it.
Especially the FAQ section.
Best regards
Stefan
01-30-2022 08:49 AM
Hi Stefan,
here are my findings so far ...
#2110
Cisco Fire Linux OS v6.6.5 (build 13)
Cisco Firepower 2110 Threat Defense v6.6.5.1 (build 15)
> show failover
descriptor exec history interface state statistics |
> show failover history
==========================================================================
From State To State Reason
==========================================================================
08:29:15 UTC Jan 18 2022
Not Detected Disabled No Error
08:29:23 UTC Jan 18 2022
Disabled Negotiation Set by the config command
08:30:08 UTC Jan 18 2022
Negotiation Just Active No Active unit found
08:30:09 UTC Jan 18 2022
Just Active Active Drain No Active unit found
08:30:09 UTC Jan 18 2022
Active Drain Active Applying Config No Active unit found
08:30:09 UTC Jan 18 2022
Active Applying Config Active Config Applied No Active unit found
08:30:09 UTC Jan 18 2022
Active Config Applied Active No Active unit found
==========================================================================
>
>
>
> show failover
Failover On
Failover unit Primary
Failover LAN Interface: Failover Ethernet1/12 (down)
Reconnect timeout 0:00:00
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 3 of 1292 maximum
MAC Address Move Notification Interval not set
failover replication http
Version: Ours 9.14(3)15, Mate 9.14(3)15
Serial Number: Ours ###########, Mate Unknown
Last Failover at: 08:30:09 UTC Jan 18 2022
This host: Primary - Active
Active time: 1065178 (sec)
slot 0: FPR-2110 hw/sw rev (1.1/9.14(3)15) status (Up Sys)
Interface outside (XXX.XXX.XXX.1): Unknown (Waiting)
Interface inside (XXX.XXX.YYY.81): Unknown (Waiting)
Interface diagnostic (0.0.0.0): Unknown (Waiting)
slot 1: snort rev (1.0) status (up)
slot 2: diskstatus rev (1.0) status (up)
Other host: Secondary - Failed
Active time: 0 (sec)
slot 0: FPR-2110 hw/sw rev (1.1/9.14(3)15) status (Unknown/Unknown)
Interface outside (XXX.XXX.XXX.2): Unknown (Waiting)
Interface inside (XXX.XXX.YYY.82): Unknown (Waiting)
Interface diagnostic (0.0.0.0): Unknown (Waiting)
slot 1: snort rev (1.0) status (up)
slot 2: diskstatus rev (1.0) status (up)
Stateful Failover Logical Update Statistics
Link : Failover Ethernet1/12 (down)
Stateful Obj xmit xerr rcv rerr
General 0 0 0 0
sys cmd 0 0 0 0
up time 0 0 0 0
RPC services 0 0 0 0
TCP conn 0 0 0 0
UDP conn 0 0 0 0
ARP tbl 0 0 0 0
Xlate_Timeout 0 0 0 0
IPv6 ND tbl 0 0 0 0
VPN IKEv1 SA 0 0 0 0
VPN IKEv1 P2 0 0 0 0
VPN IKEv2 SA 0 0 0 0
VPN IKEv2 P2 0 0 0 0
VPN CTCP upd 0 0 0 0
VPN SDI upd 0 0 0 0
VPN DHCP upd 0 0 0 0
SIP Session 0 0 0 0
SIP Tx 0 0 0 0
SIP Pinhole 0 0 0 0
Route Session 0 0 0 0
Router ID 0 0 0 0
User-Identity 0 0 0 0
CTS SGTNAME 0 0 0 0
CTS PAC 0 0 0 0
TrustSec-SXP 0 0 0 0
IPv6 Route 0 0 0 0
STS Table 0 0 0 0
Rule DB B-Sync 0 0 0 0
Rule DB P-Sync 0 0 0 0
Rule DB Delete 0 0 0 0
Logical Update Queue Information
Cur Max Total
Recv Q: 0 0 0
Xmit Q: 0 0 0
>
#4110
> show failover history
==========================================================================
From State To State Reason
==========================================================================
10:04:50 CET Jan 15 2022
Not Detected Disabled No Error
10:04:52 CET Jan 15 2022
Disabled Negotiation Set by the config command
10:05:07 CET Jan 15 2022
Negotiation Just Active No Active unit found
10:05:07 CET Jan 15 2022
Just Active Active Drain No Active unit found
10:05:07 CET Jan 15 2022
Active Drain Active Applying Config No Active unit found
10:05:07 CET Jan 15 2022
Active Applying Config Active Config Applied No Active unit found
10:05:07 CET Jan 15 2022
Active Config Applied Active No Active unit found
==========================================================================
>
> show failover
Failover On
Failover unit Primary
Failover LAN Interface: Failover Port-channel2 (down)
Reconnect timeout 0:00:00
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 30 of 1291 maximum
MAC Address Move Notification Interval not set
failover replication http
Version: Ours 9.14(3)15, Mate 9.14(3)15
Serial Number: Ours ############, Mate Unknown
Last Failover at: 10:05:07 CET Jan 15 2022
This host: Primary - Active
Active time: 1323243 (sec)
slot 0: UCSB-B200-M3-U hw/sw rev (0.0/9.14(3)15) status (Up Sys)
Interface A (x.x.x.1): Normal (Waiting)
Interface A (x.x.x.1): Normal (Waiting)
Interface A (x.x.x.1): Normal (Waiting)
Interface A (x.x.x.1): Normal (Waiting)
Interface A (x.x.x.1): Normal (Waiting)
Interface V (x.x.x.1): Normal (Waiting)
Interface V (x.x.x.1): Normal (Waiting)
Interface V (x.x.x.1): Normal (Waiting)
Interface V (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.11): Normal (Waiting)
Interface D (x.x.x.11): Normal (Waiting)
Interface D (x.x.x.11): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface D (x.x.x.1): Normal (Waiting)
Interface diagnostic (0.0.0.0): Unknown (Waiting)
slot 1: snort rev (1.0) status (up)
slot 2: diskstatus rev (1.0) status (up)
Other host: Secondary - Failed
Active time: 0 (sec)
slot 0: UCSB-B200-M3-U hw/sw rev (0.0/9.14(3)15) status (Unknown/Unknown)
Interface A (x.x.x.2): Unknown (Waiting)
Interface A (x.x.x.2): Unknown (Waiting)
Interface A (x.x.x.2): Unknown (Waiting)
Interface A (x.x.x.2): Unknown (Waiting)
Interface A (x.x.x.2): Unknown (Waiting)
Interface V (x.x.x.2): Unknown (Waiting)
Interface V (x.x.x.2): Unknown (Waiting)
Interface V (x.x.x.2): Unknown (Waiting)
Interface V (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.12): Unknown (Waiting)
Interface D (x.x.x.12): Unknown (Waiting)
Interface D (x.x.x.12): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface D (x.x.x.2): Unknown (Waiting)
Interface diagnostic (0.0.0.0): Unknown (Waiting)
slot 1: snort rev (1.0) status (up)
slot 2: diskstatus rev (1.0) status (up)
Stateful Failover Logical Update Statistics
Link : Failover Port-channel2 (down)
Stateful Obj xmit xerr rcv rerr
General 0 0 0 0
sys cmd 0 0 0 0
up time 0 0 0 0
RPC services 0 0 0 0
TCP conn 0 0 0 0
UDP conn 0 0 0 0
ARP tbl 0 0 0 0
Xlate_Timeout 0 0 0 0
IPv6 ND tbl 0 0 0 0
VPN IKEv1 SA 0 0 0 0
VPN IKEv1 P2 0 0 0 0
VPN IKEv2 SA 0 0 0 0
VPN IKEv2 P2 0 0 0 0
VPN CTCP upd 0 0 0 0
VPN SDI upd 0 0 0 0
VPN DHCP upd 0 0 0 0
SIP Session 0 0 0 0
SIP Tx 0 0 0 0
SIP Pinhole 0 0 0 0
Route Session 0 0 0 0
Router ID 0 0 0 0
User-Identity 0 0 0 0
CTS SGTNAME 0 0 0 0
CTS PAC 0 0 0 0
TrustSec-SXP 0 0 0 0
IPv6 Route 0 0 0 0
STS Table 0 0 0 0
Rule DB B-Sync 0 0 0 0
Rule DB P-Sync 0 0 0 0
Rule DB Delete 0 0 0 0
Logical Update Queue Information
Cur Max Total
Recv Q: 0 0 0
Xmit Q: 0 0 0
>
01-30-2022 11:04 AM
Failover On
Failover unit Primary
Failover LAN Interface: Failover Port-channel2 (down)
Failover On
Failover unit Primary
Failover LAN Interface: Failover Ethernet1/12 (down)
Is this output from the primarys of 2 different clusters? Both appliances are primary and this seems to be a 2110 and a 4110.
Can you please add the output from the secondary appliances of both clusters?
02-01-2022 03:43 AM
yes, these are the output from two different clusters.
On Wednesday I have a maintenace window and can switch on the standby FTD again.
I will report my finding.
01-29-2022 04:53 AM
I had seen a similar behaviour of this and ended up upgrading the FMC/firewalls to 7.0.1 which seems to have fixed this issue.
09-21-2022 01:12 PM
We had to break the HA.
Reimage the FirePower
and rebuild the HA.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: