cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3816
Views
15
Helpful
25
Replies

Does DR/BDR role affect OSPF convergence time?

lionell01
Level 1
Level 1

I've attached a LLD of my design. The ASRs have eBGP neighborship with WAN side. In the LAN, both ASRs and the firewall are in the same OSPF broadcast network. The L2 switch in between is a 6800 in VSS. Both ASRs learn the same customer routes from eBGP and redistribute into OSPF but ASR2 redistributes with a higher metric.

Our OSPF timers are hello - 3 and dead - 9

updating that we have OSPF priorities set - FW - 50 , ASR1- 30 , ASR2 - 20

We had a planned failover activity wherein both LAN cables on ASR 1 were pulled. We expected failover to happen in 9 seconds. However, it took 13 seconds for the route on the firewall to move from ASR 1 to ASR2. Did the firewall being the DR result in more convergence time? Would it have helped if ASR1 was the DR? How does convergence work in broadcast OSPF network in detail would really like to understand?

 

25 Replies 25

Hi, yes the L2 switches are in VSS

OK, the VSS have one side PO to ASR and other Side single Link to FW Pri,
that not so clear for me but make some delay 
instead use 
PO in both Side, 
the ASR and FW Pri and then check again.

BEFORE YOU DO CHANGE LET ME DOUBLE CHECK THIS SOLUTION

"""the VSS always tries to forward traffic on the locally available links. This is true for both Layer-2 and Layer-3 links. The primary motivation for local forwarding is to avoid unnecessarily sending of data traffic over the VSL link in order to reduce the latency (extra hop over the VSL) and congestion."""
I have idea but you need to check it in your side, 
ASR 60 send Hello message, the message is using Hash algorithm select to pass to VSS61 not VSS60, and since the VSS60 is direct connect to FW Sec. the FW Sec. receive the hello message not FW Pri 
and FW Pri start new OSPF elect and new OSPF establish. 
the idea from connect to two side PO AND use same HASH algorithm in both side is to prevent this case. 
to check disable one port member in ASR and check the OSPF recover again.

Hi MHM, The Palo Alto Firewalls are in active/passive mode and only the MAC of the active firewall is learnt by the switches. When traffic from ASR1 via switch1 (upper switch) needs to reach the firewall, it will go over the VSL and then onto the firewall.

The main cause of delay seems to be the ACK and SPF re-calculation. Please do see my response to Joseph along with debugs.

...see below comment 

... see below comment 

paul.driver
Level 1
Level 1

Hello

I assume you have default SSO enabled  but do you have NSF enabled for OSPF, also you mention ospf timers, as thse also default of have you set them manually?

Hello
I assume you have default SSO enabled  but do you have NSF enabled for OSPF, also you mention ospf timers, as thse also default of have you set them manually?


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

@paul driver SSO and/or NSF, on what device(s)?  (Just trying to follow your train of troubleshooting thought.)

Hello @Joseph W. Doherty 
On the VSS, however checked the OP again and its at L2 so i guess it isnt appicable..


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

I can not sleep when something in my mind 
so I start write note to solve issue because 9-13 sec is long 
NOTE:-
1-
@lionell01  you mention that "" only the MAC of the active firewall is learnt by the switches."" That so correct and meaning your config for FW HA is correct because only Active FW will participate in OSPF process. 
but I ask you again check the PO in both ASR.
2- @David Ruess  mention using BFD which for me is best solution to reduce the long time WHY?
the long time is happened because both ASR and FW not  detect the failed ASR
but you mention 
""Yes, we have considered BFD but caveat in Palo Alto Firewalls is that BFD forms only between DR and BDR. When ASR1 is down there is only DR (FW) and DROTHER (ASR2), there would be no BFD so perhaps OSPF would be shut down by BFD?""

here come the trick, I ask my self WHY there is new election since there are DR BDR and Other, but immediately stop, 
there is broadcast domain so there must be one DR one BDR and one or more DROther. 
when the ASR failed which is OLD DR the un-failed other router start new election, and since there are two router (ASR & FW)
so one will elect as DR and other will elect as BDR

since this network is contain non cisco FW I return to Palo website and find the following 
"When you enable BFD for OSPFv2 or OSPFv3 broadcast interfaces, OSPF establishes a BFD session only with its Designated Router (DR) and Backup Designated Router (BDR). On point-to-point interfaces, OSPF establishes a BFD session with the direct neighbor. On point-to-multipoint interfaces, OSPF establishes a BFD session with each peer."
https://docs.paloaltonetworks.com/pan-os/9-1/pan-os-admin/networking/bfd/bfd-overview/bfd-for-dynamic-routing-protocols


and in our case absolutely ASR will elect as DR or BDR, since there are only remain two Router in broadcast. 
so we can use BFD and detect failed ASR and reduce the recover time. 



Review Cisco Networking for a $25 gift card