I work in a large enterprise. We run our own MPLS environment for macrosegmentation. See attached diagram.
We are converting from EIGRP to OSPFv2 at our remote sites. Remote sites are running their own single-area OSPF instances (no area 0). We have one site that is having problems with its metro-E connectivity on PE2. So its iBGP session to the RR is down. PE1 is advertising a redistributed static summary route, 10.72.168.0/22, into VPNv4.
We have our general production VRF (called VRF PROD in the diagram). When PE1/PE2/CE1 were running EIGRP and OSPF in parallel, I was able to contact PE2's VRF PROD loopback from RS CE. When I turned off EIGRP for PE1/PE2/CE1, I was no longer able to reach PE2's VRF PROD loopback. PE2 no longer had a default route.
We are not redistributing BGP into OSPF. We are letting the CEs see site detail, and a default route, and that's it. The result is LSA type 1s and an LSA type 5 for the default route.
OSPF relationships are as implied in the diagram. PE1/PE2/CE1 have VRF PROD relationships in a triangle. They are all point-to-point. PE1 is configured with default-information originate. PE2, because of the BGP session problems, is not. CE1 is configured with capability vrf-lite. PE1 and PE2 are NOT configured with capability vrf-lite since they are the PEs. I would expect PE2 would not install PE1's default route if the down bit were set, but the down bit is not set. The routing bit is not set. Here is the output from the database for 0.0.0.0/0
PE2#show ip ospf 12 database external 0.0.0.0
OSPF Router with ID (10.72.171.242) (Process ID 12)
Type-5 AS External Link States
LS age: 1614
Options: (No TOS-capability, DC, Upward)
LS Type: AS External Link
Link State ID: 0.0.0.0 (External Network Number )
Advertising Router: 10.72.171.241
LS Seq Number: 8000015D
Network Mask: /0
Metric Type: 2 (Larger than any link state path)
Forward Address: 0.0.0.0
External Route Tag: 3489725929
I need to understand why PE2 is not installing the default route. Is this expected behavior? If it's not expected behavior, then it points to a software bug. If it IS expected behavior, then I need to understand it so I can assess the impact, and if we need to change our design, at other dual PE sites, for when a PE experiences a BGP session failure.
It is indeed normal behavior, as per RFC 4577 section 220.127.116.11.
Thank you. This absolutely makes sense. I was focused on the DN bit and forgot about the domain ID.
So the follow-up question is what to do in a failure scenario where a PE loses the iBGP session. There are loopback interfaces in VRF PROD that become inaccessible because, for example, PE2 won't install the default route from PE1. Everything in the access layer continues to work, so long as each access switch has an uplink to the PE with the BGP session.
The simplest solution is to point a floating static default route through the VRF PROD interregion on each PE to the partner PE. This doesn't feel like the "right" solution. But any other solution I can think of would be extremely intrusive -- for example, implementing BGP RRs with different ASes for each PE. (So all the PE1's would peer with BGP RRs that use AS 65001, for example, and all PE2's would peer with BGP RRs that use AS 65002.) Ignoring the OSPF domain would likely work -- just pay attention to the DN bit -- but I don't think there's a way to do that.
I've been looking at Cisco documentation that discusses OSPF PE-CE, but I'm not seeing this particular scenario.
you have domain-tag (VPN route tag) set, which serves the same purpose as DN bit for OSPF external routes.
Thank you. Got another reply with the same information.
Now the question is, what to do about it. With EIGRP, redistributing the default route from VPNv4 into EIGRP allowed both PEs to use each other's default routes in the event there was an iBGP session failure. Since OSPF is not installing the partner's default route, in a failure scenario (like PE2 not having iBGP sessions to RRs), loopbacks (or other directly-connected interfaces) become inaccessible. So I need some input on how to handle that scenario.
To override the default OSPF behavior on CE-PE interface, you can either manually configure domain-tag on PE1 to a different value, or configure capability vrf-lite on PE2, which ignores the DN bit/domain-id.
Please consider possible routing loops which may happen after these changes.