Re: ASA OSPF Failover

peter · ‎10-08-2008

Hi all,

We've got a redundant network with two routers, two ASAs and again two routers. Now we run OSPF between everything and the ASAs are configured in active/standby setup. When we failover we do however see that OSPF fails for about a minute and then it comes back and everything starts working again.

Is there a possibility where we can configure OSPF to be stateful just as other sessions terminating on the ASA (VPN or sessions through the ASA)? One can ofcourse tweak the timers of OSPF convergence but I want it to be stateful.

Anyone?

abinjola · ‎10-08-2008

Routing information is not replicated to the

standby PIX Firewall on Stateful Failover, including OSPF routing information. The standby

unit will not show OSPF routing information until a failover occurs and its table gets

updated.

One workaround is what you mentioned..tweak hello timers of OSPF..the other option would be due to lower the failover poll time, however don't lower it too much as it might trigger false failover/switchovers

Do rate if helpful !

peter · ‎10-08-2008

Hi,

Thanks for your answer. It clears things out. However it still is no solution to our problem. Do you know if EIGRP will do a stateful failover? It's all Cisco we run so it should not be too hard to switch over to EIGRP.

abinjola · ‎10-09-2008

EIGRP and any other routing protocol on the ASA doesn't have the capability to failover gracefully. There will always be a period of

reconvergance. This means that if you are running a dynamic routing protocol on your ASA and you have a failover you will see a network

outage for as long as it takes the routing protocol to reconverge.

j.damsgaard · ‎03-08-2012

Hi Abinjola

Here 4 years later, we have a simular issue on a set of 5585 running 8.4.(3) they are interconnected to 2 6509's with a 2*10 Gig Portchannel.

The software now supports replication off the route tabel from the active FW to the standby unit.

But when we initiate a failover we still see, that the traffic that should be running over the OSPF link (inside) is blackholed in the 5 secounds (OSPF timers 1 sec hello / 4 sec dead) it takes the former standby box / ASA to bring up the OSPF Neighborship with the 6509.

Question is, is it the ASA or the 6509, that flushes the route entrys in the route tabel, when we initiate a failover / reload / or any other event that could initiate a failover.

Is there any way to optimize this problem. ?

This upcomming weekend we will test if its the routetabel on the 6509 or on the ASA, that causes the problem.

Thanks for Your time.

Jesper Damsgaard

mz331wcisco · ‎04-07-2012

Hello

I run into the same situation while labbing the stateful failover in 8.4(1) with OSPF enabled. I run 'debug ospf' on ASA and 'debug ip ospf events' 'debug ip ospf packets' on IOS router during the failover. The outputs are not very clear to me regarding who is causing the OSPF adjacency to flap:

On the IOS router:

*Mar 1 03:38:30.611: OSPF: Neighbor change Event on interface FastEthernet0/1

*Mar 1 03:38:30.611: OSPF: DR/BDR election on FastEthernet0/1

*Mar 1 03:38:30.611: OSPF: Elect DR 10.0.1.145

*Mar 1 03:38:30.611: DR: 10.0.1.145 (Id)

*Mar 1 03:38:30.611: OSPF: Neighbor change Event on interface FastEthernet0/1

*Mar 1 03:38:30.611: OSPF: DR/BDR election on FastEthernet0/1

*Mar 1 03:38:30.611: OSPF: Elect DR 10.0.1.145

*Mar 1 03:38:30.611: DR: 10.0.1.145 (Id)

*Mar 1 03:38:30.615: OSPF: End of hello processing

*Mar 1 03:38:33.431: OSPF: Send hello to 224.0.0.5 area 0 on FastEthernet0/1 from 100.0.11.1

R1#un all

On the ASA:

ASA1# failover exec standby failover active

OSPF: rcv. v:2 t:1 l:48 rid:10.0.1.145

aid:0.0.0.0 chk:5439 aut:0 auk: from inside

OSPF: Rcv hello from 10.0.1.145 area 0 from inside 100.0.11.1

OSPF: End of hello processing

OSPF: Interface inside going Down

OSPF: Neighbor change Event on interface inside

OSPF: DR/BDR election on inside

Switching to Standby

OSPF: Elect BDR 0.0.0.0

ASA1# Elect DR 10.0.1.145

OSPF: Elect BDR 0.0.0.0

OSPF: Elect DR 10.0.1.145

DR: 10.0.1.145 (Id) BDR: none

OSPF: 10.0.1.145 address 10.0.1.145 on inside is dead, state DOWN

OSPF: Neighbor change Event on interface inside

OSPF: DR/BDR election on inside

OSPF: Elect BDR 0.0.0.0

OSPF: Elect DR 0.0.0.0

DR: none BDR: none

OSPF: Remember old DR 10.0.1.145 (id)

OSPF: Interface outside going Down

OSPF: Neighbor change Event on interface outside

OSPF: DR/BDR election on outside

OSPF: Elect BDR 0.0.0.0

OSPF: Elect DR 10.0.8.254

OSPF: Elect BDR 0.0.0.0

OSPF: Elect DR 10.0.8.254

DR: 10.0.8.254 (Id) BDR: none

OSPF: 10.0.8.254 address 10.0.8.254 on outside is dead, state DOWN

OSPF: Neighbor change Event on interface outside

OSPF: DR/BDR election on outside

OSPF: Elect BDR 0.0.0.0

OSPF: Elect DR 0.0.0.0

DR: none BDR: none

OSPF: Remember old DR 10.0.8.254 (id)

OSPF: Interface inside going Up

OSPF: Interface outside going Up

R1 (F0/1 100.0.11.1) --- (E0/0 100.0.11.10 inside) ASA (E0/1 outside 100.0.12.10) ---- (F0/1 100.0.12.2) R2

Did you manage to clarify the situation? Looks like zero downtime with OSPF and 8.4x version is not possible. Even with 1 sec hello-interval I had 10 sec outage and FTP sessions running through ASA of course failed.

Thank you

Stanislav Bakulin · ‎09-11-2012

I faced the same problem while I was testing my OSPF area with ASA 8.4.4

I have got a similar result: around 5 second for OSPF to converge. But in my case ASA sits between 2 routers in a totally NSSA area. One of them is ABR connected to the backbone and another one is ASBR that redistributes 700+ routes from BGP. OSPF convergence time is now acceptable with 8.4, but when redistribution is involved it takes some 30-40 seconds before connectivity between backbone and BGP is restored. Needless to say it is more than enough to kill all active connections.

I tried many timer configurations and apparently HELLO\DEAD doesn't make any sense any more with 8.4 as it can only affect the exact time when the downtime begins. With routig table replicated to standby ASA passes traffic without any problem until the "new" one sends out its HELLOs which breaks the area. So if the timer is set to 1s it will fail immediately after failover and if it is default 10s it would take up to 10s.

I also tried tuning LSA and SPF timers on all devices but it also had no effect. Well, it actually had: ASA crashed with OSPF page fault when the LSA and SPF timers were set to something tiny like 10 100. ASA has fewer timer configuration commands compared to IOS routers.

I tested the same topology with EIGRP instead of OSPF NSSA area and the results were just impressive. It takes the area just a couple of seconds to converge completely, even with those 700 BGP routes. No connectivity loss, sometimes just 1 or 2 ICMP ping might be missed.

Apparently ASA still does not support graceful OSPF restart while the IOS routers seem to support both Cisco’s own nonstop forwarding and RFC3623 approaches.

IOS routers have nsf commands under router ospf configuration and it seems to be enabled by default. There is nothing like that on ASA.

jumora · ‎02-26-2014

OSPF Failover causes 5 second convergence delay

CSCto62499

Description

Symptom:

When using OSPF dynamic routing combined with active/standby failover on an ASA running 8.4, the routes via from OSPF are replicated to the standby ASA. This is so that if a failover event occurs traffic using these routes will continue to pass through the new active unit. The problem is that upon a failover event, a 5 second delay is seen in OSPF convergence, which could cause a brief traffic outage.

Conditions:

ASA running 8.4 or later and OSPF as the routing protocol. This does not impact other routing protocols

Workaround:

Upgrade

8.6(0.0)

100.8(20.1)

8.4(2)

8.5(1.5)

8.5(1.242)

100.7(8.34)

9.0(0.99)

9.0(1)

9.1(1)

Value our effort and rate the assistance!