Re: OSPF Failover between two DC and two links between both DCs

Nicolás Machado S. · ‎04-20-2020

Hi all,

Currently we have a deployment with two DC connected via OSPF through a pair of LAN-2-LAN links. Both links are /30 networks part of an OSPF area 0 and adjacencies are established. But now we need to make sure that when one link fails, the other one comes up automatically, as we have made some downlink tests with the LAN-2-LAN providers and we've had several issues with the current configurations.

The scenario we have is composed by a main DC with two Nexus 5672 deployed in vPC and where each LAN-2-LAN link is connected to one Nexus switch (L2L link A - Nexus A and L2L link B - Nexus B), and another branch-office DC with a Catalyst 3850 as Core switch and where both L2L links arrive. And both the Nexus vPC and the 3850 switches are also ABR and ASBR routers.

Nexus 1 VLAN 3500 (192.168.3.1/30) ------ LAN-2-LAN ------ Catalyst 3850 VLAN 3500 (192.168.3.2/30)

Nexus 2 VLAN 3501 (192.168.4.1/30) ------ LAN-2-LAN ------ Catalyst 3850 VLAN 3501 (192.168.4.2/30)

I know this failover can be configured whether by IP SLA or Cisco PfR with PIRO, but I'd like to receive a better advice as I'm a newbie with Cisco PfR and advanced routing in Nexus.

Abzal · ‎04-20-2020

Hi,

Additional details will be helpful.

Are those 3850 switches at same geographic location? Is it some kind dark fiber between them or through ISP?

Why don't just configure equal cost load-balancing? Is it specific requirement for having one link as standby?

Best regards,
Abzal

Nicolás Machado S. · ‎04-20-2020

Hi @Abzal

Both DC are in different locations. Our Nexus are at a seaport's facilities and the 3850's are stacked in another DC. The distance between DC is about 400km.

My customer has specified the requirement to have a main L2L channel the other one as backup, but I can also mention about the equal cost load-balancing, after all, this implementation is very complex mainly in routing.

Thank you anyway, I'll give it a try.

Abzal · ‎04-20-2020

Hi,

Also you could use BFD for fast failure detection. It will improve OSPF convergence time when alternative path available with proper timers tuning.

But first check 3850 IOS-XE version because it's available starting from 16.3.1

https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst3850/software/release/16-5/configuration_guide/rtng/b_165_rtng_3850_cg/configuring_bidirectional_forwarding_detection.pdf

Also have a look into NSF technology but in order to make it work both end devices must be NSF capable.

https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst3850/software/release/16-3/configuration_guide/b_163_consolidated_3850_cg/b_163_consolidated_3850_cg_chapter_01111001.html#concept_C2FACEEE2D6645DEBE74E706C2D19A53

Best regards,
Abzal

Nicolás Machado S. · ‎04-22-2020

Hi Abzal,

I've taken a look to BFD feature and it sounds great. In fact, we're implementing it on our Nexus' VPC peer link, however, our Catalyst 3850 stack software version is 03.06.06E so I'm sure I'll have to upgrade the stack.

Richard Burts · ‎04-22-2020

We do not have much detail to work with and that makes it difficult to understand the issue or to give good advice. You have described 2 Nexus switches connected to 3850 with a transit link (and transit subnet) from each Nexus to the 3850. You tell us that OSPF is running over these transit links and that adjacencies are formed. Based on that I would expect to have both links active and to see a load sharing in operation. But you tell us that there is a customer requirement for primary/backup. Can you tell us what you have done in OSPF to make it primary/backup? Can you provide some more detail about the environment and some specifics about the problem that you encountered?

HTH

Rick

Nicolás Machado S. · ‎04-22-2020

Hi Richard,

You have described 2 Nexus switches connected to 3850 with a transit link (and transit subnet) from each Nexus to the 3850.

Yeah, that's what we have, a pair of Nexus configured in vPC, each one connected to the same Catalyst 3850 stack through LAN-2-LAN links (one L2L for Nexus-A and another L2L for Nexus-B).

You tell us that OSPF is running over these transit links and that adjacencies are formed.

It's true, OSPF runs over these L2L links between Nexus and 3850, as well as over the vPC peer link, and in area 0.

Based on that I would expect to have both links active and to see a load sharing in operation. But you tell us that there is a customer requirement for primary/backup. Can you tell us what you have done in OSPF to make it primary/backup?

Sorry, my bad, I had some details missing. I've just checked again and there's a load-sharing in operation as the existing OSPF configuration is pretty simple: we've just included L2L links over the area 0.

Can you provide some more detail about the environment and some specifics about the problem that you encountered?

The main issue encountered is when one L2L link fails (in the provider's side), traffic between both sites doesn't take the alternate path automatically through the other L2L link, even when both links are in the same OSPF area. It's worth to mention that all networks created in both Nexus and 3850 are VLAN-based, including our L2L links.

Attached here's a logic diagram of our solution, the issue occurs on OSPF area 0.

Richard Burts · ‎04-23-2020

Thank you for the explanation and especially for the drawing. The main thing that I notice in the drawing is that you have OSPF area 51 at HQ and also have OSPF area 51 at the remote site, but the two area 51s are not connected. That creates a problem. The simple solution is to change one site so that it uses a different OSPF area.

HTH

Rick

ngkin2010 · ‎04-23-2020

Hi,

As far as I know there is no big deal even there are 2 separated duplicated area connected to ABR. LSA actually not care about the area's ID. In the view of backbone area, they are 2 separated areas.

Of course, I am not encouraging anyone to use duplicated area id. It's confusing and not a good practice in term of administration.

Btw, I mentioned this just because I had this struggle before.

Richard Burts · ‎04-23-2020

@ngkin2010 makes a good point. A partitioned area is a big deal if it is area zero and not such a big deal when it is a non backbone area. I still would like to see separate area numbers, but agree that the fundamental problem here is not likely related to the area numbers being used. To understand the problem and to offer good advice we need more information about how OSPF is configured.

HTH

Rick

Giuseppe Larosa · ‎04-23-2020

Hello @Nicolás Machado S.

if you are using SVI based Layer3 interfaces you need to be very careful in defining what Vlans are permitted over trunks, because otherwise the DC to DC link failure will not cause the corresponding L3 SVI to fail and then OSPF realizes that something is wrong only when the dead interval (40 seconds) expires.

The autostate feature keeps up an SVI until at least one L2 interface is up and STP forwarding regardless if it is an access link or a trunk.

The use of BFD is recommended in your scenario as suggested by @Abzal , and using routed interfaces would also be helpful.

As noted by @Richard Burts your OSPF inter area design is not correct area 0 should be used as transit area between different areas.

The smaller site devices should change their area id to a different value

Hope to help

Giuseppe

Nicolás Machado S. · ‎04-24-2020

Hi @Giuseppe Larosa

if you are using SVI based Layer3 interfaces you need to be very careful in defining what Vlans are permitted over trunks, because otherwise the DC to DC link failure will not cause the corresponding L3 SVI to fail and then OSPF realizes that something is wrong only when the dead interval (40 seconds) expires.
The autostate feature keeps up an SVI until at least one L2 interface is up and STP forwarding regardless if it is an access link or a trunk.

The use of BFD is recommended in your scenario as suggested by @Abzal , and using routed interfaces would also be helpful.

I understand it, in fact I've decided to go for BFD. I made some labs, I'm understanding a lot of how does it work in OSPF, and I like it a lot! The use of routed interfaces is something I consider too but the only risk is that it would generate more downtime than expected, what we precisely wanna avoid during the maintenance window. I'll give a try first with BFD over the existing SVIs in both DCs.

As noted by @Richard Burts your OSPF inter area design is not correct area 0 should be used as transit area between different areas.
The smaller site devices should change their area id to a different value

Well, this OSPF area design is more fitted to the project my customer is involved, and its related to VMware NSX and usage of VXLAN for a Disaster Recovery environment between datacenters. It's pretty complex.

Now what worries me is the 3850 stack. We're currently on version 03.06.06E and BFD is supported since 16.3.1 Denali (as @Abzal already mentioned) and in Software Downloads page, the suggested version is 16.9.5. Can I upgrade directly from 03.06.06.E to 16.9.5 or is there a specific upgrade path for the 3850? I mention it because a maintenance window is something really delicate for this customer.

Thanks a lot.