Dual ISP Failover with two ASA's that are not HA

cchughes · ‎12-05-2012

Hello,

I am having a hard time getting tunnel failover working. My setup is illustrated below:

I derive my default route on the border routers. The 6513 peers with the 7206's using BGP to get the default route from each ISP into the core. On the core I use BGP weighting to get my primary default to point to ISP1. So far so good. When I look at my core I see to defaults with ISP1 preferred.

Each ASA has an IPSec tunnel to the headend site configured (Not shown). The headend site has a cryptomap entry with ISP1 and ISP2 defined (in that order) using the "set peer" command.

Failover works great if an ISP drops the connection or my 7206 or ASA fails, but...

While testing failover I had an issue where both tunnels would be active and there were issues with traffic between sites. I could not determine the root cause. I can only guess that some traffic was going out one tunnel and when trying to come back across the other tunnel was dropped from the firewall because there was no connection built for it. After reading I found that in order to use multiple peers in the "set peer" statement, I needed to configure my headend as "originate-only". I have not done this yet as I have concerns. If the headend site is "originate-only" and the tunnel, for whatever reason drops, I cannot wait for interesting traffic at the headend site bound for this site to bring up the tunnel as most of the traffic originates at this site.

I have been reading about IKE keepalives and DPD but that doesnt sound like it will re-initiate the tunnel. Is this correct? If so I'm looking for a way to make this work. Any insight would really help.

anujsharma85 · ‎12-06-2012

This should work, however we just need to make sure that Headend site should have statement configured as :

set peer

so that at a time only one peer should be active on Headend. Once this is configured then headend configuration is okay unless we are hitting any bug on that site.

With respect two these ASA sites, now we need to make sure that SLA monitoring is perfect. At a time, only one site should be active because if Headend has to clear the tunnel then it will also check if the remote peer is still active or not and if the peer is active then old tunnel will still remain on Headend.

Hope that helps.

Regards,

Anuj

cchughes · ‎12-06-2012

That’s what I thought. I even experienced the same setup working at another site connected to the headend.

Both peers are listed in the set peer statement with the firs peer being the weighted one.

Wonder if this is a bug… I’ll look.

cchughes · ‎12-06-2012

Just checked for bugs but nothing found that matches my issue. I also read the release notes for the versions I am running. Nothing there either.

andrew.prince · ‎12-06-2012

Hi,

The BGP sessions should really terminate on a routers outside of your firewalls, then you could run an iBGP session between them. Also run HSRP on the internal link, then your firewalls would point to the HSRP address, and the routers will decide who is the primary, and that could be based on physical circuit up/up or an IP SLA based on an IP address available out on the internet. You can then have 2 x VPN tunnels up/up all the time, then you can run a dynamic routing protocol, you let the dynamic routing protocol choose which tunnel to traverse to the remote end.

JMTPW

HTH>

cchughes · ‎12-06-2012

“The BGP sessions should really terminate on a routers outside of your firewalls, then you could run an iBGP session between them. “

I agree that I could do that but not sure how that would solve my tunnel issue. When I configured BGP I weighted one link and my core shows it as the preferred default. The real problem is that the headend has a tunnel built to each ISP at the remote site. I should have mentioned that I have another remote site connected to the headend site and it fails over fine. For some reason with this site both tunnels come up and thus I believe is the cause of my problem.

“Also run HSRP on the internal link,”

Please explain. I have a single 6500. In order to run hsrp and get the benefit of a VIP, wouldn’t I need a second switch sharing the same vlan?

“ You can then have 2 x VPN tunnels up/up all the time”

This would be nice. I have two tunnels up now but it looks like the headend gets confused over which one to send traffic on. That or I have a buried static route somewhere pointing traffic out the secondary link at the remote site (Just thought of that, and I will check thanks!)

anujsharma85 · ‎12-07-2012

As we know that in ideal case scenario we should have an ASA HA with dual ISP being terminated on the same cluster to make it working. Now since this is not the case, thus I have few questions and suggestions for you to get more clarity about the scenario:

1.) Where is SLA tracking configured in this scenario to check the ISP tracking and how is that dealing with VPN because in ideal case scenario on a single ASA HA it changes default route and as a result one peer is lost totally and tunnel is cleared automatically?

2.) If we are running SLA tracking internally in core network behind ASA which decides that active tunnel then this issue is bound to occur randomly because peer will not be totally lost and we can only minimize the damage by adjusting keepalives frequency. Its worth giving it a try..

3.) I believe if we are willing to change the network design then we can change the design as per Andrew with a bit of a change. What I suggest is HSRP configuration on routers allowing usage of Virtual IP adddress and along with that we may run ASA in HA as well eliminating the root cause of issue. Now when we won't have two dedicated devices participating in VPN then issue will not occur at all.... In this way we will have redundancy at Router level as well ASA level with a stable and supported failover setup.

Regards,

Anuj

cchughes · ‎12-07-2012

Answers below:

As we know that in ideal case scenario we should have an ASA HA with dual ISP being terminated on the same cluster to make it working. Now since this is not the case, thus I have few questions and suggestions for you to get more clarity about the scenario:

Response: Because we received only /28 IP blocks from each provider we elected to run the firewalls as two seperate firewalls (Non-HA).

1.) Where is SLA tracking configured in this scenario to check the ISP tracking and how is that dealing with VPN because in ideal case scenario on a single ASA HA it changes default route and as a result one peer is lost totally and tunnel is cleared automatically?

Response: The default route is received by the border routers (7200's) and passed on to the core 6513. We weight the primary on the core so it is selected as the default. So I have two BGP derived default rutes on the core. If the primary ISP stops sending a default route, then we lose the default route on the core for that ISP and the second default route takes over. This appears to work well. No IP SLA required.

2.) If we are running SLA tracking internally in core network behind ASA which decides that active tunnel then this issue is bound to occur randomly because peer will not be totally lost and we can only minimize the damage by adjusting keepalives frequency. Its worth giving it a try..

Response: No SLA requred (see above response)

3.) I believe if we are willing to change the network design then we can change the design as per Andrew with a bit of a change. What I suggest is HSRP configuration on routers allowing usage of Virtual IP adddress and along with that we may run ASA in HA as well eliminating the root cause of issue. Now when we won't have two dedicated devices participating in VPN then issue will not occur at all.... In this way we will have redundancy at Router level as well ASA level with a stable and supported failover setup.

Response: I dont understand how I could run HSRP anywhere in this or any other design using the equipment available. Running it between the 7200's wont work because the providors are not routing each others IP blocks and since we dont have a /24 from either ISP, BGP peering and HSRP are not an option.

cchughes · ‎12-07-2012

I have considered setting this up diferently. There is a design for my scenario (dual ISP's, no BGP peering, No HA) where two interfaces are configured as outside interfaces, one connected to each ISP. This would make HA available but I would still need to configure two peers in the "crypto xx map set peer" statement.

I think the dual peers at the headend site is the issue. On the headend, because two tunnels are built, return traffic may come back to the incorrect peer. The headend has no way of knowing which tunnel to put the packet on.

anujsharma85 · ‎12-07-2012

Yes, you are right. HA with dual ISP simplifies the configuration because there is no possibility of two VPNs being active at the same time then.

For reference, you can refer to following links for this:

https://supportforums.cisco.com/community/netpro/security/vpn/blog/2011/04/25/ipsec-vpn-redundancy-failover-over-redundant-isp-links

http://www.cisco.com/en/US/products/hw/vpndevc/ps2030/products_configuration_example09186a00806e880b.shtml

However, if we wish to keep the same design then we just need to make sure that at a time only one ISP should remain active and when VPN fails then old peer shall not remain active else Headend tunnel may not clear. If the issue persists and you see two tunnels being active at same time then I am afraid this would require some troubleshooting on it.

We would require output of "show crypto isakmp sa" and "show crypto ipsec sa" on Headend and both these ASAs at time of issue along with output of packet tracer on Headend showing which tunnel it is following. Also, we will have to check via ping if old peer is still reachable by Headend.

To expedite the process we can even take captures on Headend device to see if Keepalives are going over tunnel or not because or main motive is that when a peer dies then tunnel should be cleared automatically.

With reference to the bugs that I have been talking about earlier with tunnel not clearing on ASA , as what I can recall it used to exist in 8.0.x versions and older versions but not in 8.2.4 and later versions.

If issue is urgent then I would also recommend that you may raise a TAC request since troubleshooting will require few tests on this setup to check VPN failover and failback...

Regards,

Anuj

cchughes · ‎12-17-2012

I worked some more on this. I discovered that there were a few different traffic flows that were initiating the tunnel on my secondary link. One was syslog traffic destined for the headend, the other was a ping being run from a host at the headend site to the inside interface of the secondary firewall. Both made sense as the secondary sees all traffic destined for the headend site as "interesting".

To get around this in the short term I configured NAT on the primary firewall such that any traffic from the headend destined for the secondary firewall inside interface would be NAT'd to the address of the inside interface of the primary firewall. For the syslog traffic, I configured it to go to a local syslog server rather than the one at the headend site. While this seems to work there is concern about other services like LDAP which are configured to use servers at the headend site.

I'm looking into possibly setting this up as a single firewall running HA. In that setup I'll be using two outside interfaces, one to each ISP. NAT will be configured for each ISP. Any suggestions are welcome....?