Unexpected OSPF behavior in datacenter

richardgosen · ‎11-27-2013

Hi all,

I have an OSPF issue in a datacenter. This infrastructure exist in a redundant core and aggregation layer which are build with Cisco C6509-E with a Sup720-3BXL in the core and Sup720-3B in de aggregation. The interconnects are etherchannels of two 10Gbps interfaces which acts as L3 Port-channels.

My core is a MPLS VPN Superbackbone enabled network which consist of many MPLS/VPN's. This MPLS/VPN's are terminated at the core routers of de datacenter which I described above. At this point the MPLS/VPN's enter the datacenter from BGP into an OSPF vrf NSSA process en is propagated to the aggregation layer. At this point everything is stable.

Everything at the aggregation layer is routed statically en redistributed into OSPF and at the core in BGP. Almost every OSPF area is NSSA enabled. Now I wanted to do some dynamic routing between different OSPF area's by the use of firewalls and RIPv2. To get all the routes from a specific MPLS/VPN into OSPF I needed to transform the NSSA area to a regular area and do a mutual redistribution between the MPLS/VPN and the corresponding OSPF vrf proces.

At the core I tag the routes from BGP into OSPF en filter these routes inbound with a distribute-list route-map based on the previously added tag on the other core router. (and vice versa on the other core router).

After some time I saw other OSPF processes, and the target OSPF process, go down because of the dead timer expired. Somewhere the core router had not seen the keep-alive from the aggregation layer and missed it 4 times. The MPLS/VPN, that I redistributed into OSPF, has about 1450 prefixes that need te be learned. The first thing that came in mind that it was to much for OSPF to handle, but the specs of the Sup720 says that it can handle 1.000.000 and 256.000 for the 3BXL and 3B respectively. There is no MTU issue and the interfaces in the etherchannels are not under any kind of heavy load.

IOS version:s72033-adventerprisek9_wan-mz.122-33.SXI10.bin

Core router config:

router ospf 600 vrf VPN-A

log-adjacency-changes

auto-cost reference-bandwidth 200000

redistribute bgp 65000 subnets tag 600

passive-interface default

no passive-interface Port-channel1.600

no passive-interface Port-channel2.600

network 0.0.0.0 255.255.255.255 area 600

distribute-list route-map DENY_OSPF_600_601 in

route-map DENY_OSPF_600_601 deny 10

match tag 600 601

route-map DENY_OSPF_600_601 permit 20

Aggregation router config:

router ospf 600 vrf VPN-A

log-adjacency-changes

auto-cost reference-bandwidth 200000

capability vrf-lite

redistribute static subnets

passive-interface default

no passive-interface Port-channel1.600

no passive-interface Port-channel2.600

network 0.0.0.0 255.255.255.255 area 600

Do I miss something here? Can it be something with the LSA refresh timer?

Anyone seen this bahavior before?

richardgosen · ‎11-29-2013

Found the issue that is causing this behavior.

There was this commando in the configuration:

"mls qos protocol ospf police 32000 1000"

And a lot of dropped ospf packets.

Anyway happy now.

Richard Burts · ‎11-29-2013

Thanks for posting back to the forum to indicate that the issue is solved and what the problem was. +5 for an interesting and valuable solution.

I find this one to be an interesting reminder about the need to be careful about the impact on management traffic when we implement QOS.

HTH

Rick

HTH

Rick

daniel.dib · ‎11-29-2013

It's an interesting discussion. The problem with policing control plane traffic is how do you decide if it's a good or a bad packet. In some cases it's trivial but things like CDP, BPDUs, LACP etc can be difficult to determine how to police it.

Daniel Dib
CCIE #37149

Daniel Dib
CCIE #37149
CCDE #20160011

Please rate helpful posts.