Solved: Issue regarding failover on an Active/Active ASA cluster with shared interfaces

Siebe Brouwer · ‎08-12-2011

Hello Community,

I have an issue regarding failover on an Active/Active ASA cluster with some shared interfaces. I have two ASA firewalls running 16 contexts, most "inside" and "outside" interfaces are shared between contexts and each context has at least one interface which is dedicated for that context.

All contexts are operating fine and failover works too. However, all interfaces which are shared between contexts are in the "Normal (Waiting)" state. An interface the belong to just one context is in the "Normal" state.

Output from "show failover" from the System context:

Context1/act# show failover

<removed details>

This host: Primary

slot 0: ASA5520 hw/sw rev (1.1/8.2(5)) status (Up Sys)

Context01 Interface outside (x.x.0.132): Normal (Waiting)

Context01 Interface inside (x.x.230.43): Normal (Waiting)

Context01 Interface dmz (x.x.8.1): Normal

Context02 Interface outside (x.x.0.138): Normal (Waiting)

Context02 Interface inside (x.x.230.40): Normal (Waiting)

Context02 Interface dmz (x.x.203.1): Normal

<removed details of other contexts>

slot 1: empty

Other host: Secondary

slot 0: ASA5520 hw/sw rev (1.1/8.2(5)) status (Up Sys)

Context01 Interface outside (x.x.0.133): Normal (Waiting)

Context01 Interface inside (x.x.230.44): Normal (Waiting)

Context01 Interface dmz (x.x.8.2): Normal

Context02 Interface outside (x.x.0.139): Normal (Waiting)

Context02 Interface inside (x.x.230.41): Normal (Waiting)

Context02 Interface dmz (x.x.203.2): Normal

<removed details of other contexts>

slot 1: empty

<removed details>

Context1/act#

There is more: if I ping the IP address of an interface on the standby context from the active context, this doesn't work for shared interfaces, but it does work for non-shared interface:

Context1/Context1/act# ping outside x.x.0.133

Type escape sequence to abort.

Sending 5, 100-byte ICMP Echos to x.x.0.133, timeout is 2 seconds:

?????

Success rate is 0 percent (0/5)

Context1/Context1/act# ping inside x.x.230.44

Type escape sequence to abort.

Sending 5, 100-byte ICMP Echos to x.x.230.44, timeout is 2 seconds:

?????

Success rate is 0 percent (0/5)

Context1/Context1/act# ping dmz x.x.8.2

Type escape sequence to abort.

Sending 5, 100-byte ICMP Echos to x.x.8.2, timeout is 2 seconds:

!!!!!

Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/10 ms

Context1/Context1/act#

Finally the Context complains about receiving spoofed traffic constantly. This is my real issue: all contexts generate these messages and it's flooding the Syslog server.

Context1/Context1/act# show logging

<removed details>

Aug 12 2011 13:04:48: %ASA-1-105009: (Primary_group_1) Testing on interface outside Passed

Aug 12 2011 13:04:48: %ASA-1-105009: (Primary_group_1) Testing on interface inside Passed

Aug 12 2011 13:04:48: %ASA-1-106021: Deny SCPS reverse path check from x.x.0.132 to x.x.0.133 on interface outside

Aug 12 2011 13:04:48: %ASA-1-106021: Deny SCPS reverse path check from x.x.230.43 to x.x.230.44 on interface inside

Aug 12 2011 13:04:48: %ASA-1-106021: Deny SCPS reverse path check from x.x.0.132 to x.x.0.133 on interface outside

Aug 12 2011 13:04:48: %ASA-1-106021: Deny SCPS reverse path check from x.x.230.43 to x.x.230.44 on interface inside

Aug 12 2011 13:04:48: %ASA-1-106021: Deny SCPS reverse path check from x.x.0.132 to x.x.0.133 on interface outside

Aug 12 2011 13:04:48: %ASA-1-106021: Deny SCPS reverse path check from x.x.230.43 to x.x.230.44 on interface inside

Aug 12 2011 13:04:52: %ASA-1-106021: Deny SCPS reverse path check from x.x.0.132 to x.x.0.133 on interface outside

Aug 12 2011 13:04:52: %ASA-1-106021: Deny SCPS reverse path check from x.x.230.43 to x.x.230.44 on interface inside

Aug 12 2011 13:04:52: %ASA-1-106021: Deny SCPS reverse path check from x.x.0.132 to x.x.0.133 on interface outside

Aug 12 2011 13:04:52: %ASA-1-106021: Deny SCPS reverse path check from x.x.230.43 to x.x.230.44 on interface inside

Aug 12 2011 13:04:52: %ASA-1-106021: Deny SCPS reverse path check from x.x.0.132 to x.x.0.133 on interface outside

Aug 12 2011 13:04:52: %ASA-1-106021: Deny SCPS reverse path check from x.x.230.43 to x.x.230.44 on interface inside

Aug 12 2011 13:04:52: %ASA-1-106021: Deny SCPS reverse path check from x.x.0.132 to x.x.0.133 on interface outside

Aug 12 2011 13:04:52: %ASA-1-106021: Deny SCPS reverse path check from x.x.230.43 to x.x.230.44 on interface inside

Disabling ReversePathFilter checking doesn't stop the messages. My guess is the ASA is blocking the "Hello" messages it needs to determine the monitor state of the interface. (Hence the waiting state....)

Any help is appreciated!

ASA version: 8.2(5)

Platform: ASA5520-K8

Kind regards,

Siebe

Paul Rehill · ‎09-04-2011

We ran into the same issue and apparently it is unpublished Cisco Internal bug. Go figure.

CSCtq28055

From TAC

You would not find this release on the Cisco official website as it is an internal release.

Need to upgrade to 8.2.(5.11).

Paul

View solution in original post

varrao · ‎08-12-2011

Can you share an output of " show run failover" from your firewall, for both the devices.

-Varun

Thanks,
Varun Rao

Siebe Brouwer · ‎08-12-2011

Sure:

Context1/act# show run failover

failover

failover lan unit primary

failover lan interface failover-lan GigabitEthernet0/3

failover replication http

failover link failover-lan GigabitEthernet0/3

failover interface ip failover-lan 10.10.100.1 255.255.255.248 standby 10.10.100.2

failover group 1

preempt 120

failover group 2

preempt 120

Context1/act#

Context1/act# failover exec standby show run failover

failover

failover lan unit secondary

failover lan interface failover-lan GigabitEthernet0/3

failover replication http

failover link failover-lan GigabitEthernet0/3

failover interface ip failover-lan 10.10.100.1 255.255.255.248 standby 10.10.100.2

failover group 1

preempt 120

failover group 2

preempt 120

Context1/act#

praprama · ‎08-23-2011

Hi Siebe,

Its interesting that the ASA is seeing packets sent with a source of it's own IP address. Do you have mac-address auto enabled in the systems context? Try that and see if it helps.

Is this a new configuration or did you uprade to 8.2(5) and noticed it stopped working?

Regards,

Prapanch

Siebe Brouwer · ‎08-23-2011

Hi Prapanch,

mac-address auto is enabled. I checked the MAC addresses and they are all unique.

I upgraded to 8.2(5) to see if it would solve the issue. Unfortunately I don't recall then this behavious started.

I'm guessing here, but the Cluster has had lots of "In-service Upgrades" and the cluster uptime is more than 4 year.

Context1/act# show version

Cisco Adaptive Security Appliance Software Version 8.2(5)

Device Manager Version 6.4(5)

Compiled on Fri 20-May-11 16:00 by builders

System image file is "disk0:/asa825-k8.bin"

Config file at boot was "startup-config"

smcfw05ztm up 11 days 10 hours

failover cluster up 4 years 114 days

Context1/act#

Could it be time for a clean boot?

praprama · ‎08-23-2011

Can you get captures from one of the contexts' shared interface? For example, from Context01:

cap capo interface outside match ip host x.x.0.132 host x.x.0.133

and then post the output of show cap capo

Regards,

Prapanch

Siebe Brouwer · ‎08-24-2011

Here's what I've done:

First I used Wireshark to capture the hello's on the a non-shared interface. Look's like ASA uses IP protocol 105 to check the interface status of it's peer. I can see packets flowing in both directions. On a shared interface there were no packets with IP protocol 105. So my conclusion is that the hello's never leave the interface. To prove this a started a capture on the active en standby firewall:

Context1/Context1/act# show capture

capture drop type asp-drop all [Buffer Full - 524255 bytes]

Context1/Context1/act# show capture drop | in x.x.241.

2: 10:01:04.191167 x.x.241.13 > x.x.241.14: ip-proto-105, length 48 Drop-reason: (rpf-violated) Reverse-path verify failed

4: 10:01:07.938260 x.x.241.13 > x.x.241.14: ip-proto-105, length 44 Drop-reason: (rpf-violated) Reverse-path verify failed

6: 10:01:07.943951 x.x.241.13 > x.x.241.14: ip-proto-105, length 44 Drop-reason: (rpf-violated) Reverse-path verify failed

8: 10:01:07.950344 x.x.241.13 > x.x.241.14: ip-proto-105, length 44 Drop-reason: (rpf-violated) Reverse-path verify failed

10: 10:01:07.956264 x.x.241.13 > x.x.241.14: ip-proto-105, length 44 Drop-reason: (rpf-violated) Reverse-path verify failed

Context1/Context1/stby# show capture

capture drop type asp-drop all [Buffer Full - 524255 bytes]

Context1/Context1/stby# show capture drop | in x.x.241.

2: 10:27:02.648388 x.x.241.14 > x.x.241.13: ip-proto-105, length 48 Drop-reason: (rpf-violated) Reverse-path verify failed

4: 10:27:04.164755 x.x.241.14 > x.x.241.13: ip-proto-105, length 44 Drop-reason: (rpf-violated) Reverse-path verify failed

6: 10:27:04.170828 x.x.241.14 > x.x.241.13: ip-proto-105, length 44 Drop-reason: (rpf-violated) Reverse-path verify failed

8: 10:27:04.176290 x.x.241.14 > x.x.241.13: ip-proto-105, length 44 Drop-reason: (rpf-violated) Reverse-path verify failed

10: 10:27:04.177328 x.x.241.14 > x.x.241.13: ip-proto-105, length 44 Drop-reason: (rpf-violated) Reverse-path verify failed

IP address x.x.241.13 and x.x.241.14 are the IP addresses on the inside interface for respectively the active and standby firewall.

So the ASA it dropping it's own packets. Since the ASA is complaining about RPF, I disabled RFP checking on all interfaces (no ip verify reverse-path interface inside). But the captured results were exactly the same!

praprama · ‎08-24-2011

Its interesting you do not see any packets leaving the ASA but you still see packets being dropped. Disabling RPF check is probably not helping here probably because the ASA sees the source IP address of the packet as it's own IP address.

I would suggest having this looked into by opening a TAC case if you have a valid support contract for this ASA as troubleshooting remotely this way will not really be feasible.

Do let us know if you manage to figure out what's going on.

Regards,

Prapanch

Paul Rehill · ‎09-04-2011

We ran into the same issue and apparently it is unpublished Cisco Internal bug. Go figure.

CSCtq28055

From TAC

You would not find this release on the Cisco official website as it is an internal release.

Need to upgrade to 8.2.(5.11).

Paul

Siebe Brouwer · ‎09-05-2011

OK. Thanks for the heads up. This proves the issue is known and solved by Cisco.

Since it is not a serious issue, I will wait for Cisco to release either an interim release of 8.2(5) of maybe 8.2(6).

8.2(5) was released on the 23rd of May 2011, which is more than 3 month ago. So I guess it won't be long now....

Siebe Brouwer · ‎10-05-2011

Cisco released interim release 8.2(5.13) on the 18th of September. After installing this version the issue was gone.

Interim Release Notes don't mention the BugID though.

Big thank you to Paul for providing the correct answer. Small thank you to everyone else that contributed.