ASA Failover with HSRP issues

Kyle Smith · ‎10-22-2012

Good Morning,

This past weekend I scheduled some maintenance to move our LAN gateway off our ASA 5520's and onto our collapsed core which contains 3750 stacks. I was able to get the gateway moved down onto the collapsed core, the point to point network between the ASA's worked and was able to route packets, and the internal devices were able to get out to the internet. I then tried to failover to the other ASA just to make sure everything was working properly and this is where my project hit a wall.

When doing a failover to the other unit the primary IP address for some reason didnt float over to the secondary. For example, lets say I used 10.1.0.0/28

where the primary ASA IP is 10.1.0.1 with a standby of 10.1.0.2 and the two internal 3720 stacks each had IPs of 10.1.0.4 and 10.1.0.5 with a HSRP instance using the 10.1.0.3 IP.

So using the IPs above, when working off the primary (lets call it ASA1) everything worked properly and was able to ping the secondary ASA but when I failed over to the secondary (ASA2), ASA1 got the IP of 10.1.0.2 but ASA2 didnt ge the ip of 10.1.0.1 and I wasnt able to get into it since the IP didnt move over.

The odd thing about all of this is the config changes were being sent to the other ASA so I am confused as to why it didnt work. Both 3750 stacks had a route statement of 0.0.0.0 0.0.0.0 10.1.0.1 which I believe is correct and the ASA had a route back into the LAN that seemed to work properly off the ASA. When I rolled back, the gateway on the ASA's was able to ping regardless of which one was the primary.

Has anyone experienced this issue before that can shed some insight or be able to explain what I did wrong? I can give more information if needed.

Thanks,

Marvin Rhoads · ‎10-22-2012

How did you effect the failover? Did you run "show failover" and/or "show ip address" when you were having the problem?

From the high level description it sounds as if you did it right. I could speculate about arp cache entries but it would be a shot in the dark....

Kyle Smith · ‎10-22-2012

Hello Marvin,

Thanks for your reply. No failover commands were issued during our maintenance so I feel that the failover config shouldnt have been effected. I guess the only part of the failover config that would have changed would have been the primary and standby IP addresses for the inside interfaces although I figured that wouldnt have been an issue. I did issue show failover commands and show ip address commands - I wish I would have captured the output.

From what I remember, the show failover showed the correct statistics. By that I mean when failing over to the secondary it showed that the Primary was standby and the Secondary was active. When I did a show ip address on the Primary I got an ip of 10.1.0.2 but I couldnt SSH or ping 10.1.0.1 which should have been the IP on the active ASA.

I guess the more I talk it out it may be a routing problem? From the ASA, routing to our default gateway I moved down (using fake IPs again) I had:

route inside 10.2.0.0 255.255.0.0 10.1.0.3

and from both the cores I had:

ip route 0.0.0.0 0.0.0.0 10.1.0.1

Marvin Rhoads · ‎10-22-2012

I asked "effect" not "affect". In other words what did you do to make the active node role switch to be on the secondary unit?

If the secondary unit believed it was not ready to take on the active role (usually due to a monitored interface being unavailable), it could mess things up.

If there's a good failover link up and for whatever reason you can't get onto one of the units, you can sometimes get good results with the "failover exec ..." command to pull information from the other unit.

Kyle Smith · ‎10-22-2012

Doh, silly me. On the primary ASA ASDM I clicked the make standby button. Only reason I used ASDM was for screenshot purposes for our prod controls team.

Your monitored interface being unavailable comment made me think of something. I am using HSRP on the core layer so that the 10.1.0.3 IP always remains up. Is it possible that the 10.1.0.1 IP moved to the secondary but wasnt ready to take on the active role because the core stack connected to ASA1 still was answering the 10.1.0.3 traffic and not ASA2?

Ah, thank you for the advice. That is good to know because I was very curious about the other ASA.

Kyle Smith · ‎10-22-2012

I did some googling and found some potential issues. Here is an example of my HSRP commands issued on the Core level 3750 stacks.

###Main Core###

ip route 0.0.0.0 0.0.0.0 10.1.0.1

interface GigabitEthernet 1/0/1

switchport mode access

switchport access vlan 902

interface vlan 902

ip address 10.1.0.4 255.255.255.240

standby 1 ip 10.1.0.3

standby 1 priority 200

standby 1 preempt

standby 1 authentication md5 key-string XXXXX

interface vlan 10

ip address 10.10.0.2 255.255.0.0

standby 1 ip 10.10.0.1

standby 1 preempt

standby 1 priority 200

standby 1 authentication md5 key-string XXXXXX

###Secondary Core###

ip route 0.0.0.0 0.0.0.0 10.1.0.1

interface GigabitEthernet 1/0/22

switchport mode access

switchport access vlan 902

interface vlan 902

ip address 10.1.0.5 255.255.255.240

standby 1 ip 10.1.0.3

standby 1 preempt

standby 1 authentication md5 key-string XXXXXXXXX

interface vlan 10

ip address 10.10.0.3 255.255.0.0

standby 1 ip 10.10.0.1

standby 1 preempt

standby 1 authentication md5 key-string XXXXXXX

Is the issue I am running into caused from me not adding a tracking command to my SVI of 902? If it is, how does the switch know to failover the ip of 10.1.0.3 if its a software failover and there isnt a disconnect from the ASA to Core?

Kyle Smith · ‎10-23-2012

Should this go under the Route/Switching forum? I havent heard back from anyone and dont want to double post.

Marvin Rhoads · ‎10-23-2012

It's a user forum. We try to chime in as our day job allows.

Your HSRP should not have to fail over. HSRP is operating independently from ASA failover. Failover of the ASA is mostly about the downstream switch recognizing that mac address associated with the active ASA inside address is now out port connected to ASA2 vice ASA1. Since ASAs use a virtual MAC which is on the Active unit (primary or secondary as the case may be) that should not be an issue for you. (Reference)

Kyle Smith · ‎10-24-2012

Ah my mistake Marvin, I thought there were staff members as well on these forums. I appreciate the time you are taking out of your day to respond!

Ok so do you think I have an incorrect routing statement somewhere? I guess I am just stumped as to why this isnt working, it must be a single line of config or something I am not doing properly. I dont get why the 10.1.0.1 address seems to drop off the network on a failover.

What I was thinking up until reading your reference and your post was that when I failed over to ASA2 being active, the Core stack in the other building had the 10.1.0.3 address and in order for it to route traffic out with my route 0.0.0.0 0.0.0.0 10.1.0.1 command it for some reason wasnt trying to send it across the trunk to the core in the other building and then up to ASA2.

Marvin Rhoads · ‎10-24-2012

There are Cisco staff around but there's no SLA or obligation for them to reply here. The TAC (if you have support contract coverage) is the avenue for a guaranteed Cisco response.

It smells more like a L2 problem to me.

Does the inter-switch trunk allow VLAN 902? (It should if the HSRP group is forming properly.)

Can the main core switch reach the standby ASA 10.1.0.2 address?

Is failover state helathy on the ASAs? What does "show failover" report?

Kyle Smith · ‎10-24-2012

Yes, when doing a show interfaces trunk I see the VLAN allowed on the inter-switch trunk.

Here is the show failover you were asking for. Since we had to roll back I dont know if it will help illustrate what went wrong on Saturday. I did bold a section that looks odd to me because the standby should be 10.10.0.2. I will see what you think though.

FW01# show failover

Failover On

Failover unit Primary

Failover LAN Interface: failover GigabitEthernet0/3 (up)

Unit Poll frequency 1 seconds, holdtime 15 seconds

Interface Poll frequency 5 seconds, holdtime 25 seconds

Interface Policy 1

Monitored Interfaces 3 of 160 maximum

Version: Ours 8.4(3), Mate 8.4(3)

Last Failover at: 06:40:31 EDT Oct 20 2012

This host: Primary - Active

Active time: 10031620 (sec)

slot 0: ASA5520 hw/sw rev (2.0/8.4(3)) status (Up Sys)

Interface outside (X.X.X.X): Normal (Monitored)

Interface inside (10.10.0.1): Normal (Waiting)

Interface dmz (X.X.X.X): Normal (Monitored)

slot 1: ASA-SSM-20 hw/sw rev (1.0/7.0(7)E4) status (Up/Up)

IPS, 7.0(7)E4, Up

Other host: Secondary - Standby Ready

Active time: 11034958 (sec)

slot 0: ASA5520 hw/sw rev (2.0/8.4(3)) status (Up Sys)

Interface outside (X.X.X.X): Normal (Monitored)

Interface inside (0.0.0.0): Normal (Waiting)

Interface dmz (X.X.X.X): Normal (Monitored)

slot 1: ASA-SSM-20 hw/sw rev (1.0/7.0(7)E4) status (Up/Up)

IPS, 7.0(7)E4, Up

Stateful Failover Logical Update Statistics

Link : failover GigabitEthernet0/3 (up)

Stateful Obj xmit xerr rcv rerr

General 595126873 0 826888610 560787

sys cmd 2680812 0 2680812 0

up time 0 0 0 0

RPC services 0 0 0 0

TCP conn 172374390 0 223907351 212485

UDP conn 122738711 0 195481873 348302

ARP tbl 297162520 0 404571469 0

Xlate_Timeout 0 0 0 0

IPv6 ND tbl 0 0 0 0

VPN IKEv1 SA 4499 0 5612 0

VPN IKEv1 P2 120583 0 182830 0

VPN IKEv2 SA 0 0 0 0

VPN IKEv2 P2 0 0 0 0

VPN CTCP upd 0 0 0 0

VPN SDI upd 0 0 0 0

VPN DHCP upd 0 0 0 0

SIP Session 0 0 0 0

Route Session 0 0 0 0

User-Identity 45358 0 58663 0

Logical Update Queue Information

Cur Max Total

Recv Q: 0 31 1045252720

Xmit Q: 0 1503 745317343

Jouni Forss · ‎10-24-2012

Whats the complete configuration of interface "inside" ?

Does it have the standby IP address configured in the interface configuration? I guess it should have since you have pinged it?

- Jouni

Marvin Rhoads · ‎10-24-2012

Yes that output looks odd. The standby IP should be showing up.

What does the interface address section of your inside interface look like? I would expect something like:

nameif inside

security-level 100

ip address 10.10.0.1 255.255.255.240 standby 10.10.0.2

Kyle Smith · ‎10-24-2012

Here is what the inside interface looks like on a show run:

interface GigabitEthernet0/1

nameif inside

security-level 100

ip address 10.10.0.1 255.255.0.0 standby 10.10.0.2

when we rolled back for some reason the standby command didnt take on the inside interface so I had to manually add it just now for that interface. During Saturday's maintenance I remember all interfaces as monitored

Here is the new show failover output:

CLE-FW01# show failover

Failover On

Failover unit Primary

Failover LAN Interface: failover GigabitEthernet0/3 (up)

Unit Poll frequency 1 seconds, holdtime 15 seconds

Interface Poll frequency 5 seconds, holdtime 25 seconds

Interface Policy 1

Monitored Interfaces 3 of 160 maximum

Version: Ours 8.4(3), Mate 8.4(3)

Last Failover at: 06:40:31 EDT Oct 20 2012

This host: Primary - Active

Active time: 10033667 (sec)

slot 0: ASA5520 hw/sw rev (2.0/8.4(3)) status (Up Sys)

Interface outside (X.X.X.X): Normal (Monitored)

Interface inside (10.10.0.1): Normal (Monitored)

Interface dmz (X.X.X.X): Normal (Monitored)

slot 1: ASA-SSM-20 hw/sw rev (1.0/7.0(7)E4) status (Up/Up)

IPS, 7.0(7)E4, Up

Other host: Secondary - Standby Ready

Active time: 11034958 (sec)

slot 0: ASA5520 hw/sw rev (2.0/8.4(3)) status (Up Sys)

Interface outside (X.X.X.X): Normal (Monitored)

Interface inside (10.10.0.2): Normal (Monitored)

Interface dmz (X.X.X.X): Normal (Monitored)

slot 1: ASA-SSM-20 hw/sw rev (1.0/7.0(7)E4) status (Up/Up)

IPS, 7.0(7)E4, Up

Stateful Failover Logical Update Statistics

Link : failover GigabitEthernet0/3 (up)

Stateful Obj xmit xerr rcv rerr

General 595396365 0 826888884 560787

sys cmd 2681086 0 2681086 0

up time 0 0 0 0

RPC services 0 0 0 0

TCP conn 172455720 0 223907351 212485

UDP conn 122836608 0 195481873 348302

ARP tbl 297252441 0 404571469 0

Xlate_Timeout 0 0 0 0

IPv6 ND tbl 0 0 0 0

VPN IKEv1 SA 4500 0 5612 0

VPN IKEv1 P2 120636 0 182830 0

VPN IKEv2 SA 0 0 0 0

VPN IKEv2 P2 0 0 0 0

VPN CTCP upd 0 0 0 0

VPN SDI upd 0 0 0 0

VPN DHCP upd 0 0 0 0

SIP Session 0 0 0 0

Route Session 0 0 0 0

User-Identity 45374 0 58663 0

Logical Update Queue Information

Cur Max Total

Recv Q: 0 31 1045252994

Xmit Q: 0 1503 745636858