05-18-2016 04:20 PM - edited 03-08-2019 05:49 AM
Forgive me if I'm posting in the wrong community.
Recently, I've had to take over ownership of a pair of CISCO ASA 5540's running ASA version 8.4(7)26. The previous administrators are no longer around. I'm very much a newb with Cisco, but I've managed sonicwalls and ISA appliances/servers in the past. While trying to establish management connectivity to one of the appliances I turned on the management interface and did a write memory, problem was I did it on the standby/Secondary member...not the active/Primary....I know, stupid huh? But I didn't know you shouldn't do it. I'm surprised the OS allowed me to do it. That was last week.
Today, I was getting ready to update the ASA version (like I've done on a few others clusters) and when I ran a show failover I noticed that the Primary appliance was in a failed state and the Secondary is now active.
I've read through a couple forums and articles and this is the troubleshooting I've done thus far:
1. Rebooted Primary/Failed appliance - no joy
2. Performed a "WRITE STANDBY" on the Secondary/Active appliance - afterwards on Primary show failover shows it as ready/standing-by but then after a few minutes it fails again
3. Turned off Secondary and Primary, turned on Primary first waited a few then turned on secondary - both show green Active lights
4. Compared both "SHOW RUN" and found a difference in the LAN Failover Interface, Secondary one shows speed 1000 and duplex full, Primary shows nothing under the interface aside from description - this is odd because I'm not seeing that as a difference in show runs on other clusters, but then again they are running the newest ASA version available for that model. Crypto on each (near last line of show run) is different on each (but I found that's normal because the other working clusters I now own show that particular difference). And of course the name of the devices.
5. I tried to update the Primary with the same interface settings as the secondary but got a message that I needed to break the cluster first (something like that).
So, at this point I'm at a loss on how to force them to sync. I'd rather not break the pair and re-do it because I've never set up a cluster before (by myself). At this point my Secondary shows active and the Primary is Failed.
Any help would be very much appreciated.
PS Please don't ask for my show run as per policy I can't provide it in a open forum - Also, I've spent multiple hours googling for the answer but maybe my search query is not properly worded.
05-19-2016 08:29 AM
Hi
Is the cluster link directly connected between them or passing thru switches?
Are you able to provide output of show failover, show failover history, and show failover state
I'm not asking for the complete show run but just the failover part if possible
05-19-2016 09:27 AM
Sure can, I did need to scrub it slightly. I do want to advise that the Primary appliance is now turned off to avoid issues since as of last night (after powering on the units in a different order they both came up as active on appliance light. Thanks for the help!
SHOW FAILOVER:
Failover On
Failover unit Secondary
Failover LAN Interface: failover GigabitEthernet0/2 (Failed - No Switchover)
Unit Poll frequency 5 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 3 of 210 maximum
Version: Ours 8.4(7)26, Mate Unknown
Last Failover at: 13:48:50 MST May 18 2016
This host: Secondary - Active
Active time: 70385 (sec)
slot 0: ASA5540 hw/sw rev (2.0/8.4(7)26) status (Up Sys)
Interface production (x.11.254.143): Unknown (Waiting)
Interface labs (x.11.34.10): Unknown (Waiting)
Interface management (192.168.1.1): No Link (Waiting)
slot 1: empty
Other host: Primary - Failed
Active time: 0 (sec)
slot 0: empty
Interface production (x.11.254.144): Unknown (Waiting)
Interface labs (x.11.34.11): Unknown (Waiting)
Interface management (0.0.0.0): Unknown (Waiting)
slot 1: empty
Stateful Failover Logical Update Statistics
Link : stateful GigabitEthernet0/3 (down)
Stateful Obj xmit xerr rcv rerr
General 0 0 0 0
sys cmd 0 0 0 0
up time 0 0 0 0
RPC services 0 0 0 0
TCP conn 0 0 0 0
UDP conn 0 0 0 0
ARP tbl 0 0 0 0
Xlate_Timeout 0 0 0 0
IPv6 ND tbl 0 0 0 0
VPN IKEv1 SA 0 0 0 0
VPN IKEv1 P2 0 0 0 0
VPN IKEv2 SA 0 0 0 0
VPN IKEv2 P2 0 0 0 0
VPN CTCP upd 0 0 0 0
VPN SDI upd 0 0 0 0
VPN DHCP upd 0 0 0 0
SIP Session 0 0 0 0
Route Session 0 0 0 0
User-Identity 0 0 0 0
Logical Update Queue Information
Cur Max Total
Recv Q: 0 0 0
Xmit Q: 0 0 0
SHOW FAILOVER HISTORY:
==========================================================================
From State To State Reason
==========================================================================
13:47:54 MST May 18 2016
Not Detected Negotiation No Error
13:48:50 MST May 18 2016
Negotiation Just Active No Active unit found
13:48:50 MST May 18 2016
Just Active Active Drain No Active unit found
13:48:50 MST May 18 2016
Active Drain Active Applying Config No Active unit found
13:48:50 MST May 18 2016
Active Applying Config Active Config Applied No Active unit found
13:48:50 MST May 18 2016
Active Config Applied Active No Active unit found
==========================================================================
SHOW FAILOVER STATE
State Last Failure Reason Date/Time
This host - Secondary
Active None
Other host - Primary
Failed Comm Failure 13:49:10 MST May 18 2016
====Configuration State===
====Communication State===
05-19-2016 09:50 AM
Hi
Since the other unit is down, we don't see really what's going on. And every infos written in the output looks like normal (default behavior when a cluster unit is missing.) Maybe you can connect it out of business hours to get some outputs.
Could you run this commands:
- show running-config failover
You can also try to issue failover reset in order to change the failed unit to an unfailed state.
This command could be done on both units, the recommendation is to issue that on the active one that will effect on the standby unit.
Be careful to do that command during non business hours
Thanks
05-19-2016 09:57 AM
Thanks for the assistance, I'll try out your suggestions within the next few days and provide back what I can.
Thanks again!
05-19-2016 10:00 AM
Ok good. You're very welcome.
06-13-2016 01:44 PM
So, I haven't performed a "failover reset" yet, but I diff'd the two show runs and found that the only difference (aside from serial #, failover lan unit entry, and cryptochecksum) was the LAN Failover Interface, the current active unit (which is my secondary) shows:
interface GigabitEthernet0/2
description LAN Failover Interface
speed 1000
duplex full
!
But my failed-passive unit (my primary unit) shows:
interface GigabitEthernet0/2
description LAN Failover Interface
!
When I try to add the speed and duplex I get an error message stating:
ERROR: Interface is in use by failover. You must disable failover first to execute this command
Should I proceed with the failover reset, and if so from which unit (primary or secondary)?
06-13-2016 02:00 PM
Yes you could run the reset from active one. And normally it should have effect on the standby unit
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide