OSPF Adjacency flapping over Metro E

mitchell helton · ‎11-15-2012

Hey folks,

We have several remote sites and we're using OSPF for dynamic routing. We have site-to-site VPNs set up, so when our Metro E WAN goes down, we can still have connectivity to our remote locations.

We ran across an issue yesterday where our Metro E provider had a failing switch that was causing our OSPF adjacencies to flap even though the interfaces on "our" gear was staying up. So our VPN would kick in and then 30 seconds later, our WAN would come back up and take over, then it would go back down, etc, etc.

What can we do to stop this? I've read about IP event dampening and LSA throttling - although I don't know much about that. I tried doing some testing last night with them and didn't have much luck. Are these potential solutions for us? What are you guys doing?

Currently, we just shut down the problematic interfaces until the problem is resolved. We'd like something with a little less administrative overhead.

Thanks,

Mitch

d-schuemann · ‎11-15-2012

Routing loop or mtu?

Sent from Cisco Technical Support iPad App

XIE YAO · ‎11-16-2012

Hi Mitch,

IP event dampening should not be able to help as you mentioned the interface didn't go down.

Perhaps this article has provide some help with LSA throttling.

http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/fsolsath.html

SOcchiogrosso · ‎11-16-2012

Perhas looking into BFD for your OSPF neighbors over your Metro-E connections.

http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/fs_bfd.html#wp1053749

This should cover you for fast detection of failures that occur between your two CE devices.

and in my opinion cleaner that LSA throttling.

--
CCNP, CCIP, CCDP, CCNA: Security/Wireless
Blog: http://ccie-or-null.net/

-- CCNP, CCIP, CCDP, CCNA: Security/Wireless Blog: http://ccie-or-null.net/

Raju Sekharan · ‎11-16-2012

Hi

These flaps can happen due to packet loss,mtu issues, etc

You have different tools to take differnt issue

1,. if you want to dampen a flapping interface, you can use ip event dampening

2. To take care of end to connectivity, you can use BFD

3. There is a 3rd way I can suggest to check the MTU and shut the interface if there is an MTU issue in the path. But this involves multiple tools and this kind of steps, I have used noramlly for troubleshooting frequently failing links.

a) use IP SLA to track the neighbour IP. select the payload size of IP SLA based on the MTU on the path

b) Ue a track to track the IP SLA

c) configure an ACL on the otherside to not accept any fragmented packets from your router exit interface to the otherside IP address

d)configure an EEM to put an acl on your incoming interface to block the ospf hellos when the track goes down when IP SLA reports failre to track

e) configure another EEM to remove the acl, when the track goes up when the MTU issue is resolved on the path

Thank you

Raju

mitchell helton · ‎11-16-2012

Hey guys... thanks for the suggestions and advice. I'm going to investigate these options and play around in a test lab to see what works for us.

Just to be clear, there are no routing loops, and I'm not sure what you mean by an MTU issue. Maybe I explained the situation wrong. I'll try to explain better just so we're all on the same page:

We have a Metro E circuit and a Site to Site VPN that are both always up. The Site to Site link has a higher OSPF cost so it is never the preferred path unless the Metro E is down.

The problem I'm trying to solve is when the WAN goes down, the Site to Site picks up (as it should) but we drop phone calls as it recalculates the SPF algorithm. When the WAN comes back up, we drop calls again as it recalculates the algorithm. This process continues until the ISP fixes their problem or we shut down the interfaces.

What I want to do is similar to route dampening in BGP. After an adjacency change, make sure the neighbor isn't going to flap again before it reforms an adjacency. I don't even know if this is possible.

Anyway, just wanted to make sure I was being clear. perhaps the suggestions you folks suggested will solve this problem. I haven't had a chance to look yet.

Thanks again.