I have an issue with using the OSPF downward bit and hope someone has seen this before. It appears to do nothing in this example to prevent routes being learnt via the wrong path. That is via a backup router that has learnt the route from the site primary router which has received the OSPF route originally redistibuted into OSPF from the PE (with downward bit set).
The docco says:
"The down bit is used between the PE-routers to indicate which routes were inserted into the OSPF topology database from the MPLS VPN super-backbone and thus shall not be redistributed back in the MPLS VPN super-backbone. The PE-router that redistributes the MP-BGP route as OSPF route into the OSPF topology database sets the down bit. Other PE-routers use the down bit to prevent this route from being redistributed back into MP-BGP. "
Therefore I would not expect a route received with the downward bit set to be installed into the route table nor BGP table however the below shows it is? This has essentially created a routing scenario where core routes are learnt via a dual OSPF attached access site.
The PE receiving the incorrect route:
7609#sh ip ospf 116 database summary 192.168.104.0
OSPF Router with ID (10.200.204.116) (Process ID 116)
Summary Net Link States (Area 0)
LS age: 1094
Options: (No TOS-capability, DC, Downward)
LS Type: Summary Links(Network)
Link State ID: 192.168.104.0 (summary Network Number)
Advertising Router: 10.200.212.116
LS Seq Number: 80000013
Network Mask: /24
MTID: 0 Metric: 1798
7609#sh ip route vrf RED 192.168.104.0
Routing Table: RED
Routing entry for 192.168.104.0/24
Known via "ospf 116", distance 110, metric 1798, type intra area
Redistributing via bgp 100
Advertised by bgp 100 match internal external 1 & 2 nssa-external 1 & 2
Last update from 10.1.59.138 on GigabitEthernet1/0/1.3684, 00:18:23 ago
Routing Descriptor Blocks:
* 10.1.59.138, from 10.200.4.229, 00:18:23 ago, via GigabitEthernet1/0/1.3684
Route metric is 1798, traffic share count is 1
7609#sh ip bgp vpnv4 vrf RED 192.168.104.0
BGP routing table entry for 100:116:192.168.104.0/24, version 195113
Paths: (1 available, best #1, table RED)
Advertised to update-groups:
10.1.59.138 from 0.0.0.0 (10.200.0.65)
Origin incomplete, metric 1798, localpref 100, weight 32768, valid, sourced, best
Extended Community: RT:100:116 OSPF DOMAIN ID:0x0005:0x000000740200
OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:10.200.204.116:512
mpls labels in/out 312/nolabel
Just a quick blind shot: are you perhaps using the capability vrf-lite configured in the VRF RED on your 7609 that is accepting the LSA even with Down bit set?
No its not running vrf-lite. Its a full PE with MPLS to the core.
We are just atempting to setup dual attached sites with OSPF failover but its not so smooth with OSPF as the PE-CE routing protocol. BGP seems to be a lot smoother for this but in this case Im stuck with OSPF.
A quick question... what is the router with OSPF RID 10.200.4.229? Your offending routes are advertised by that router. The Down bit is set on LSA-3 that is originated by OSPF router 10.200.212.116, i.e. a different router. I think we are looking at two different things - your router installed an OSPF route from a different router than the one you have shown us the LSA-3 from.
OSPF RID 10.200.4.229 is the WAN Interface of a PPP ADSL service at the access site in question. It is the DR for the local LAN subnet 192.168.104.0/24. This device should be originating this route.
OSPF router 10.200.212.116 is a loopback and the router RID of the PE / LNS whrere the above PPP ADSL service terminates. Not the 7609 in the original outputs I provided. It looks like the downward bit is being set correctly here, however the route is also being learnt direct from other CPE's (via ospf on the PE) and not the actual redistributed route. ie all access sites are using area 0 and can all see each others LSA's via OSPF running on the PEs. I think this is the issue.
What I would really like is a good doc that shows best practice use of OSPF as the CE - PE routing protocol where you have dual attached access sites - in our case each service terminates on a different PE and is a different type of link, typically from different layer 2 providers.
Ive implemented what I consider to be a workaround, but it does now work as expected. I had to filter the updates received by the PE from the primary CPE at each site to only accept the LAN range from that site, not the LAN range from other sites that it was learning via the backup CPE, via PE, via other site backup router. I think I was barking up the wrong tree re the downward bit. This issue is because there is indirect OSPF connectivity between all the CPE's connected to the same PE. Basically the PE needs to never pass on LSA's learnt from one CPE to another CPE then we would be all ok.
I still have an issue to solve for PPP xDSL services as they all share the one loopback as an ip unnumbered interface on the PE. The same distribute list approach wont work there.
Here is what I have done. There must be a less config intense and flexible approach to this.
router ospf 116 vrf RED
distribute-list prefix RED-acacia-ospf-subnets-inbound in GigabitEthernet1/0/1.3681
distribute-list prefix RED-acacia-ospf-subnets-inbound in GigabitEthernet1/0/1.3682
distribute-list prefix RED-geebung-ospf-subnets-inbound in GigabitEthernet1/0/1.3684
distribute-list prefix RED-kawana-ospf-subnets-inbound in GigabitEthernet1/0/1.3685
distribute-list prefix RED-goldcoast-ospf-subnets-inbound in GigabitEthernet1/0/1.3686
ip prefix-list RED-geebung-ospf-subnets-inbound permit 192.168.103.0/24
ip prefix-list RED-acacia-ospf-subnets-inbound permit 192.168.100.0/24
ip prefix-list RED-acacia-ospf-subnets-inbound permit 0.0.0.0/32
ip prefix-list RED-kawana-ospf-subnets-inbound permit 192.168.104.0/24
ip prefix-list RED-goldcoast-ospf-subnets-inbound permit 192.168.101.0/24
The issue with distribute lists in OSPF is that they are only filtering the routes computed from SPF and being installed into RIB/FIB. The LSAs themselves are not modified and they are still flooded as usual. Please note that this is not a particular limitation of the OSPF but rather of all link-state protocols.
Yes, the Down bit was working fine in your particular case. Sadly, I am trying to understand your topology but I have troubles making my head around it. Do you believe you could post an exhibit of your topology showing where the individual OSPF adjacencies are, what is the offending CPE/CPE OSPF communication making all these troubles, etc.? It would help me a lot!
If the customers are using some kind of backdoor links, i.e. backup interconnections in parallel to your MPLS L3VPN, then the OSPF Sham Link would be a solution, but this again is just a blind shot. I would need to understand your topology better.
Hopefully the attached helps a little. The classic sham link scenario does not really apply here. Both routers are in each case at the same site and are directly connected to each other and the local site /24 subnet. Its a active / backup setup. Where is all cases the PPP ADSL service is the backup only.
All is working as expected now with the additions of the distribute-lists. Its just very config intensive and bound to break as things change. There must be a better way (dont use OSPF).
Don't blame OSPF just yet, it is a fine protocol
I am still thinking that a sham link could actually provide a nice quick fix: the sham link allows you to define an arbitrary cost on it. Now assume that the OSPF costs of the real PPP connections were significantly higher than the OSPF sham link cost, say, 10000 versus the sham link cost of 1. The OSPF would therefore prefer the route through the MPLS backbone instead of direct PPP links. The routes would still appear as learned by OSPF on your PEs but the next hop would effectively point into the MPLS cloud. Would that meet your needs?
Yes that does sound viable Peter. I'll sleep on it and discuss with the team. Thanks for the advice.
I also read that not using area 0 for the PE-CE links changes things but its not clear if this will help at all as all the CPE will still be in the same area. I would expect the same behavior as we see now. See http://sites.google.com/site/amitsciscozone/home/important-tips/mpls-wiki/ospf-as-pe-ce-routing-protocol-in-mpls-vpn
PEs and OSPF Area 0
The PE/CE link can belong to any area including area 0 for an OSPF domain. If the PE attaches to the CE using a non-zero area, then the PE router acts as an ABR for that area. The MPLS VPN backbone acts as a Super Backbone
If the OSPF domain has area 0 routers other than PE routers, then one of those must be CE router and must have an area 0 link to atleast one PE router using virtual-link to the PE router. This is necessary because Area 0 should be contiguous and so that inter-area and external routes can be leaked between PE routers and the non-PE OSPF backbone.
could you pls provide output of following commands from PE2:
7609#sh ip ospf 116 database summary 192.168.102.0
7609#sh ip ospf 116 database router 192.168.102.0
7609#sh ip route vrf RED 192.168.102.0
7609#sh ip bgp vpnv4 vrf RED 192.168.102.0
Down-bit is set during MP-BGP to OSPF redistribution, but I guess that in you scenario network 192.168.102.0/24 is also advertised from CE3 to CE6 directly via OSPF CE3-PE1-CE1-CE2-PE2-CE6 (apart from MP-BGP). You should see router LSAs for 192.168.102.0 on PE2.
If the above mentioned assumptions are correct, than one of the solution could be to increase administrative distance of OSPF routes to be more than 200, so MP-iBGP routes are prefered over OSPF routes. The drawback of this is non-optimal traffic flow between CE6 and CE2, which will be via MPLS instead of CE6-PE2-CE2, but maybe this is desired for transit traffic to flow via MPLS.
What do you think about it?