MLSCEF-DFC1-4-FIB_EXCEPTION_THRESHOLD:

Keith McElroy · ‎12-04-2012

OK, as you can see got that error as well as these:

*Aug 8 00:03:37: %MLSCEF-SP-4-FIB_EXCEPTION_THRESHOLD: Hardware CEF entry usage is at 95% capacity for MPLS protocol.

*Aug 8 00:03:36: %MLSCEF-SP-STDBY-4-FIB_EXCEPTION_THRESHOLD: Hardware CEF entry usage is at 95% capacity for MPLS protocol.

*Aug 8 00:03:39: %MLSCEF-DFC2-4-FIB_EXCEPTION_THRESHOLD: Hardware CEF entry usage is at 95% capacity for MPLS protocol.

*Aug 8 00:03:41: %MLSCEF-DFC1-4-FIB_EXCEPTION_THRESHOLD: Hardware CEF entry usage is at 95% capacity for IPv4 unicast protocol.

Little background on what I have and what was being done when this happened. I hope this makes sense from a design standpoint, but I am working with the hardware and services I have been given and trying to make it function.

Currently we have a 7609 that is doing PE as well as full Internet table border duties. I don't have dedicated route reflectors and no router redundancy, I am fairly limited by budget. We are preparing to roll out MPLS enterprise services and in preperation, I was planning on pushing the Internet table to an "Internet" VRF so I can easily bleed over routes to allow Internet access for the users as well as putting the phone switch in that VRF so it can have public access and also facilitate MPLS for customers that need to do voice (this will be most). I am limited to only this 7609 for this POP connected to a 6509 that does interior switching and interconnects. I am using a SUP720 with 1GB of RAM. I am currently consuming approximately 650MB of RAM and processor averages 10% and spikes to 60% max during BGP sweeper.

FIB TCAM maximum routes :

=======================

Current :-

-------

IPv4 + MPLS - 512k (default)

IPv6 + IP Multicast - 256k (default)

There is my FIB TCAM

Total routes: 427677

IPv4 unicast routes: 427571

IPv4 Multicast routes: 3

MPLS routes: 102

IPv6 unicast routes: 1

IPv6 multicast routes: 0

EoM routes: 0

And there is current usage, as you can see, still within the allotment. It locked up during the errors, but I was consoled in so I could see everything. It stated I ran out of memory, gave errors for not being able to allocate labels (I realize I would probably have to increase the allocation for LDP) and turned off CEF/distributed forwarding. I have attached a text file with the output I got from the console, I deleted the BGP neighbor failures with any IPs, but the rest is untouched. If anyone has any suggestions or ideas, I would appreciate the feedback.

Raju Sekharan · ‎12-04-2012

The default allocation for ipv4+MPLS is 512K routes and you have approximately 430K in use. So if yo utry to push more routes in IPV4+ MPLS, it can lead to FIB TCAM Exception

You have very few routes in ipv6 and multicast. If you don't have any plans to use multicast or ipv6 on this, then you can decrease the allocation for ipv6 and multicast

mls cef maximum-routes ipv6

mls cef maximum-routes ip-multicast

This requires a reload of the router. Also if you do this make sure that you don't have

multiple booot statements in your router. If you don't have space in your rommon, it can

lead to continuous reload of the router

Thanks

Raju

Keith McElroy · ‎12-05-2012

I wasn't actually pushing anymore routes. The routes were exactly the same, just moving from global to a VRF. I still had a good 80k+ open routes for the FIB. That is why I am confused, cause the CPU spiked, memory was gone and FIB was demolished when they all had plenty of overhead.

Raju Sekharan · ‎12-06-2012

Hi Keith,

What was the steps you took to move the route from Global to VRF?

Was there at any point of time, Global and VRF both leanring internet routes?

From the logs it is clear that something caused the FIB TCAM utlization to hit the limit

If you hit MLS FIB Exception, then it requires a reload of modules to recover from it

Thank you

Raju

Keith McElroy · ‎12-06-2012

Well, I first moved all the interfaces over to the new VRF. From there I added the EIGRP and BGP config for the VRF. I didn't remove the global config for BGP, but I wouldn't think it would matter since no interfaces were left in the global anyway. It was running for a good 3-5 minutes after the changes and then it seemed to choke as I was finalizing and cleaning up the last of the config. Obviously this was likely during the BGP convergence time, so I am now thinking maybe it was still removing everything from the global config while inserting the VRF routes? So in other words, maybe it was still cleaning up the global while the VRF happened and caused overlap issues? I couldn't check during cause of the lock ups. I have never watched an entire Internet table be removed, so not sure the time it takes to free everything up. I have no lab with similar equipment cause of budget issues.

Any thoughts on the CPU and memory issue? It seems strange that it would peg that out as well. Even as it is now, the router seems to have moments of freezing when I am SSH into the router and the usage is never that high. I am just wondering if the SUP720 can't handle what I am doing with it. I had an issue with the same type of router in another POP, but the memory was nearly out at the time so I ended up swapping out to RSP720s and now it is much better about resources.

I was also wondering if the design made sense. I have dealt with MPLS before, but never designed it ground up to allow MPLS and Internet routes and this seemed like the only logical way to do it without manually bleeding things over from the global.

Thanks for your responses.

Raju Sekharan · ‎12-06-2012

Hi Keith,

1. If your Global BGP was up, it should have all the prefixes in FIB TCAM which it leanred from neighbors. So if your Global BGP was up while brining up BGP in VRF, you should have hit the exception becasue you don't have enough space in FIB TCAM

2. Without a "show proc mem" during the issue, it is difficult to find the process consuming more memory

If you had learned complete internet table on both Global and VRF that could have caused the memory utilization to increase.

If you hit low memory, the CEF will get disabled on the router and this will make packets to get processed by CPU and will lead to high CPU

Thank you

Raju

Keith McElroy · ‎12-06-2012

OK, that sounds pretty solid and makes sense. Would it just be best then to pull all the BGP config first and then migrate over? Do you know about how long it takes for the FIB to clear after all that is done? I am having to do this work in a maintenance window with no backup router, so it is kind of rough and I have a tight deadline and have to try to keep downtime to a minimum.

Raju Sekharan · ‎12-06-2012

Hi Keith,

If you remove the BGP configs, that should be enough.even shutting down the neighbors will do

Once you remove the BGP neighbors, the routes should get deleted immediately from routing table and TCAM

I understand your difficulty of not having a backup router

Thanks

Raju

Keith McElroy · ‎12-06-2012

Alright, I will give this a try in the next maintenance window. Thanks for your assistance.

Raju Sekharan · ‎12-06-2012

Please check the status of folowing outputs to see the status once you remove the BGP configs

1. Show ip route summary

2. Show mls cef summary

Thank you

Raju

Keith McElroy · ‎01-17-2013

Hey Raju,

This is a bit late on a response, but I just now was able to change since we had network change freezes for the holidays.

I got everything up and looked clean, no errors this time, but had a weird problem. We have 2 Internet links there and the route tables populated completely inbound. When I checked outbound, the advertised routes were 0. The IBGP links were advertising just fine to our other routers, but I could not get it to populate the summary to the ISPs. It was only one summary route for our local IP blocks from that site. The weird part was it seemed to work for a short period, then died and I have no idea why. Just as a refresh, we put all the links, including the ones that go to the IBGP into the VRF called EXTERNAL.

I verified the routes were in the IGP table and the same summary route was in the table internally, so that looked clean. I literally was copying the current setup over and just moving it to the VRF BGP setup, so it is exactly the same otherwise. I went so far as to add network statements for the specific interfaces just to see if it made a difference, no go. The fact that it worked for a period, then stopped is sort of confusing, but I saw no issues with memory or anything.

Raju Sekharan · ‎01-17-2013

Hi keith

How are you advertising the prefix outise? Is it through network configuration or you are advertising the prefix received via IBGP?

Could you update the bgp table for the prefix which needs to be advertised and the advertised routes to wards that neighbor

Thanks

Raju

Keith McElroy · ‎01-17-2013

I originally was just using aggregate to advertise the whole subnet. The subnet is directly connected. I attempted to use some network statements for grins to see if it would come up and that didn't happen. No matter what I tried, I couldn't get the prefix to advertise out to the EBGP peers, but they went fine to the IBGP peers (although there was a blip for about 2 minutes where it started working with me changing nothing). It might be a software glitch, not sure, I am on 12.2.33SRC3.

Raju Sekharan · ‎01-17-2013

Hi Keith,

What is the current status? Did it start advertising the prefixes or you are still facing the problem

SRC3 is old code.

Thanks

Raju

Keith McElroy · ‎01-17-2013

It was during a maintenance window, so I had to roll it back to get everything running again. Everything is clean now.