Solved: OSPF/EIGRP Redistribution Frustration

m.glosson · ‎02-05-2015

I am having trouble with a secondary connection. Attached is the topology. Everything is reachable (no tunnel problems, etc). This post will focus on the SiteA-2911 router.

The problem is that the ClinicWAN router (as well as the internal L3-switches, etc) sees and favors the EIGRP routes through the backup DMVPN connection. I have taken steps to make this not be the case (even steps that I considered unnecessary) but still the EIGRP gets favored. I know that the administrative distance of OSPF is 110 and EIGRP's is 90 for internal routes and 170 for external routes. I had that in mind, so instead of simply advertising the routes from the SiteA router in the normal way (with "network" commands), I did a redistribute connected so they would get assigned an admin distance of 170 right away.

What I'm actually seeing from the SiteA router is that the MPLS/OSPF connection is being favored (e.g., 10.10.0.0/16). What I see from the VpnBackup and ClinicWAN routers is that routes to 10.50.0.0/16 are sent over the DMVPN. That means that it's doing asynchronous routing. If I purposely kill the MPLS connection (or do a "passive-interface ..." command on the SiteA router, everything goes over the DMVPN without issue. If I take down the EIGRP on the SiteA router ("passive-interface Tu0") everything routes fine over the MPLS/OSPF.

I have done this in the past multiple times, but it was always done with eBGP in the MPLS, which has an administrative distance of 20 and I haven't had to do any tricks, etc. Converting the sites to BGP is possible, I suppose, but would be a bit of a pain, plus where's the fun in that? The real fun is figuring out THIS problem. :)

Here are some of the relevant configs:

VpnBackup2821

VpnBackup2821#sh run | s ^router
router eigrp 100
 passive-interface GigabitEthernet0/0
 network 10.0.0.0
 no auto-summary

VpnBackup2821#sh ip int br
Interface                  IP-Address      OK? Method Status                Protocol
GigabitEthernet0/1         10.10.10.252    YES NVRAM  up                    up
Loopback1                  69.xxx.yyy.zzz  YES NVRAM  up                    up
Tunnel0                    10.0.250.1      YES NVRAM  up                    up

ClinicWAN

ClinicWan#sh ip int br | e down
Interface                  IP-Address      OK? Method Status                Protocol
GigabitEthernet0/0         10.10.10.250    YES NVRAM  up                    up
GigabitEthernet0/1         192.168.0.2     YES NVRAM  up                    up
Loopback0                  10.1.254.254    YES NVRAM  up                    up

ClinicWan#sh run | s ^router
router eigrp 100
 redistribute ospf 100 metric 50000 50 255 1 1500
 passive-interface GigabitEthernet0/1
 network 10.1.254.254 0.0.0.0
 network 10.10.0.0 0.0.255.255
 no auto-summary
router ospf 100
 router-id 10.10.10.250
 domain-id 0.0.0.0
 log-adjacency-changes
 redistribute static subnets route-map REDISTATIC
 redistribute eigrp 100 subnets
 passive-interface GigabitEthernet0/0
 passive-interface Serial1/0
 network 10.1.254.254 0.0.0.0 area 0
 network 162.97.109.200 0.0.0.3 area 0
 network 192.168.0.0 0.0.0.3 area 0
 default-information originate

SiteA-2911

SiteA-2911#sh ip int br | e down
Interface                  IP-Address      OK? Method Status                Protocol
GigabitEthernet0/0.50      10.50.1.250     YES NVRAM  up                    up
GigabitEthernet0/0.950     10.90.50.250    YES NVRAM  up                    up
GigabitEthernet0/2         69.xxx.yy.zzz   YES NVRAM  up                    up
Multilink1                 67.xx.yyy.zzz   YES NVRAM  up                    up
Tunnel0                    10.0.250.50     YES NVRAM  up                    up

SiteA-2911#sh run | s ^router
router eigrp 100
 network 10.0.250.0 0.0.0.255
 redistribute connected metric 3000 30000 255 1 1500 route-map EIGRP-DISTRIBUTE-RMAP
 distance eigrp 125 170
 passive-interface default
 no passive-interface Tunnel0
router ospf 100
 domain-id 0.0.0.0
 passive-interface default
 no passive-interface Multilink1
 network 10.50.0.0 0.0.255.255 area 50
 network 10.90.50.0 0.0.0.255 area 50
 network 67.xx.yyy.zzz 0.0.0.3 area 0

SiteA-2911#sh run | s ^(ip pref|route-map)
ip prefix-list EIGRP-DISTRIBUTE-PREFIX seq 5 permit 10.50.0.0/16
ip prefix-list EIGRP-DISTRIBUTE-PREFIX seq 10 permit 10.90.50.0/24
route-map EIGRP-DISTRIBUTE-RMAP permit 10
 match ip address prefix-list EIGRP-DISTRIBUTE-PREFIX

What happens is this: if the MPLS goes down, everything switches over to the DMVPN. When the MPLS comes back up, traffic from SiteA goes back to MPLS as desired, but return traffic never switches back to the MPLS until we intentionally kill the DMVPN connection.

Let me know what other output you might like to see (show commands, etc).

Jon Marshall · ‎02-08-2015

I am wondering if the fact that the EIGRP-learned routes got advertised as E2 routes back to PE routers could bear any significance, but at this point, I do not see any.

As usual Peter has highlighted the issue.

It's not a bug and the above is exactly the problem which I managed to replicate in a lab.

The issue is that when everything is working as normal the Clinic router has an OSPF IA route for 10.50.0.0/16 in it's routing table. It is also receiving an EIGRP external from the VPN router for the same network but as it can't be installed into the routing table it is not advertised to the PE.

When the MPLS link goes down the Clinic router installs the EIGRP route via the VPN router.

When the MPLS link comes back up the Clinic router redistributes this EIGRP route into OSPF and advertises it to the PE router.

The PE router now has two routes for the same network.

One is from the site A PE router and is an IBGP route with a weight of 0.

The other is an OSPF route which it redistributes into BGP and because of this it gets a weight of 32768.

So the PE chooses the one with the higher weight. Because this route is already in the VRF on the PE router as an OSPF E2 there is nothing to advertise back to the Clinic router which means the Clinic router keeps it's EIGRP external route in the routing table.

As soon as you shut the tunnel on the VPN router the Clinic router loses the EIGRP route so doesn't advertise it to the PE which then allows the PE to redistribute the BGP route it learnt from the remote PE into OSPF and your Clinic router receives this.

Obviously when the tunnel is back up the Clinic router receives the EIGRP route again but cannot install it into the routing table because of the OSPF route already there.

I guess it should be a race condition as Peter mentions but I have to say I failed it over a number of times and only once did it fall back to the OSPF route.

So I guess the question is why are you advertising the branch routes learnt via the DMPVN tunnels to the MPLS network ie. if the MPLS network at a branch fails do you not want all traffic to and from that branch to use the DMVPN tunnels only.

You may have a reason to do it in which case you would probably need to talk to the SP as it would require further configuration on their PE routers.

I also noticed you aren't advertising 10.10.0.0/16 anywhere in your OSPF configuration on the Clinic router although it could be either coming from EIGRP or the statics you are redistributing under your OSPF process.

Or it could be covered by your default route but you mentioned in your original post the branch was seeing a route for the specific network via the MPLS connection.

If it is coming from EIGRP then if you decided to remove the EIGRP to OSPF redistribution you would need a network statement for it under your OSPF configuration.

Jon

View solution in original post

Jon Marshall · ‎02-09-2015

Matt

Yes, Peter as usual was being a bit modest :-).

I wasn't even considering the PE side of things until he mentioned that and then it kind of clicked because this is a common issue as he said.

The simple answer to your problem is yes you need to filter the branch routes from being advertised to the MPLS network otherwise your failover will never reliably work.

I would use a route map with the EIGRP to OSPF redistribution and just allow only the networks behind the Clinic router.

That should fix the issue.

In terms of redistributing the other way it's not creating a problem and it may be needed for the MPLS only sites at the moment.

It's difficult to say without knowing more about the branches and the DMPVN ie.

do they have local internet ?

is this a hub and spoke only or is it a spoke to spoke as well ?

But yes, the immediate issue is to stop the advertisement of branch networks back to the MPLS PE from the Clinic router.

Happy to discuss further if you want to.

Jon

View solution in original post

Jon Marshall · ‎02-05-2015

What is the subnet mask on the gi0/0.50 interface on the Site A router ?

Jon

m.glosson · ‎02-06-2015

Here are the subnet masks of the networks involved:

SiteA, Gi0/0.50 - 10.50.0.0/16
ClinicWAN and VpnBackup: 10.10.0.0/16

SiteA-2911#sh run int g0/0.50

...
interface GigabitEthernet0/0.50
encapsulation dot1Q 50
ip address 10.50.1.250 255.255.0.0
...

Jon Marshall · ‎02-05-2015

Forgot to ask.

Are the OSPF routes received from remote sites via MPLS seen as OSPF externals on the ClinicWan router ?

Jon

m.glosson · ‎02-06-2015

They are seen as Inter-Area rather than E2, which one might expect. Here's some output from ClinicWan:

ClinicWan#sh ip route | i 10.50.0.0
O IA    10.50.0.0/16 [110/24] via 192.168.0.1, 21:35:41, GigabitEthernet0/1

ClinicWan#sh ip route 10.50.0.0
Routing entry for 10.50.0.0/16
  Known via "ospf 100", distance 110, metric 24, type inter area
  Redistributing via eigrp 100
  Advertised by eigrp 100 metric 50000 50 255 1 1500
  Last update from 192.168.0.1 on GigabitEthernet0/1, 21:30:41 ago
  Routing Descriptor Blocks:
  * 192.168.0.1, from 67.17.126.239, 21:30:41 ago, via GigabitEthernet0/1
      Route metric is 24, traffic share count is 1

Jon Marshall · ‎02-06-2015

So the above is when the ClinicWan router is using it's MPLS connection to get to Site A ?

And if MPLS goes down then an E2 route via the VpnBackup2821 is installed in the routing table.

And this route is for 10.50.0.0/16 just the same as the inter area route.

And then when MPLS comes back up it doesn't reinstall the route via MPLS.

Is that correct ?

Jon

m.glosson · ‎02-06-2015

Yes.

No. Since VpnBackup2821 is not running OSPF and communicates over native EIGRP with the SiteA router, when the MPLS is not running, the route appears as "D EX" meaning an externally injected EIGRP route, with 170 as the administrative distance.

Yes, the only mask on 10.50.x.x is /16... there are no other routes in 10.50.0.0 with other masks.

Correct. When MPLS comes back up, the SiteA router starts routing over it, but the routers at HQ (on the 10.10.0.0/16 network) do not... they continue routing over the backup (more accurately, the route in ClinicWan points to VpnBackup2821 which points to SiteA over the Tunnel0 interface).

Once I upgrade the ClinicWan router, I want to see if it makes any difference... when in doubt, just upgrade, right? :)

Jon Marshall · ‎02-06-2015

Once I upgrade the ClinicWan router, I want to see if it makes any difference... when in doubt, just upgrade, right? :)

I always use that as a very last resort as I suspect you do by the sounds of it :-)

I know sometimes it is a bug but I'll try every other option before I have to go down that route.

I blame it on my Unix background, you never rebooted a machine unless you absolutely couldn't fix it any other way :-)

That said I cannot understand how your router is ignoring the OSPF inter area route when the MPLS connection comes back up.

If you could post your full configurations as attachments for the three routers I can try and replicate the problem in a lab environment although I have to say I'm not too hopeful.

Jon

AMediaFilm · ‎02-06-2015

Could you `show ip os nei` on Clinic router? Cancel this, 've found passive command in config. :)

Just show your `sh ip os rib 10.50.0.0`. EIGRP redistributes into ospf route to 10.50 via dmvpn?

m.glosson · ‎02-06-2015

I think the code must be too old for this command. We have plans to upgrade to 12.4(15)T17 next Tuesday, but at the moment, 12.3(14)T4 doesn't have it.

The DMVPN terminates on the VpnBackup2821 as you can see by the existence of the "Tu0" interface on it. That router is not running OSPF at all.

Peter Paluch · ‎02-07-2015

Friends,

Please allow me to join.

This issue is very interesting. At first, I thought this is one of those race conditions seen in networks with multiple bi-directional redistribution points. However, this does not appear to be one of those cases.

Let me think aloud, just freely commenting the process of network convergence after MPLS goes down and back up. Perhaps we'll come across some point we have been missing.

So when the MPLS connection goes down, all routes present at SiteA are advertised only via DMVPN over to VpnBackup in EIGRP and over the LAN to ClinicWAN. There they get redistributed into OSPF and perhaps even reinjected into the MPLS (depending on where the MPLS connection was cut). Obviously, OSPF is used as the PE-CE protocol here, and because ClinicWAN was shown to see the routes from SiteA as O-IA routes, no doubt this is done thanks to proper BGP/OSPF interaction on the provider's PE routers. I am wondering if the fact that the EIGRP-learned routes got advertised as E2 routes back to PE routers could bear any significance, but at this point, I do not see any.

At this point, as a check, performing show ip ospf database on ClinicWAN should display a number of LSA-5 (AS External Link State) for individual EIGRP-learned networks that got redistributed into OSPF, with ClinicWAN being their originator. Can we confirm this by an actual output?

When the MPLS connection is reinstated, the PE routers should advertise the prefixes from SiteA as LSA-3 to ClinicWAN which should install them into its LSDB, and after running SPF, these prefixes should be installed into its routing table as well. Now we know that this is not happening. Therefore, my immediate question is: After the MPLS connection is first torn down to let the EIGRP routes take over, and is then reinstated and a minute or so has passed to allow OSPF and the MPLS provider's BGP to settle down, is it possible to see the LSA-3 (Summary Net Link State) covering the SiteA prefixes learned from the PE router on the ClinicWAN when running show ip ospf database? If yes then there is something wrong with the ClinicWAN router itself, as after running SPF, these routes should definitely be offered to the RIB with a better AD than the existing EIGRP-learned routes. If not, however, then there is an issue with the SiteA routes being carried over the MPLS (OSPF-BGP-OSPF sequence of redistribution), and this will need further investigation.

I am looking forward to your reply!

Best regards,
Peter

Jon Marshall · ‎02-08-2015