07-03-2009 06:42 AM - edited 03-04-2019 05:19 AM
Dear all
am dealing with a a BGP flap issue, and looked at pretty much all what I can think of.
outQ is not able to drop to zero, this is on a Route reflector which has 2 legs to a distribution switch. IGP used is IS-IS, platform is 7301.
Since it is a Reflector, it has many BGP sessions. all sessions are behaving as expected....except the one to the primary reflector.
I checked for L2 issues, MTU issues. I have no QOS, nor do I see need for it. since its a reflector no much traffic carried on Ge ports.
extended pings between loopbacks works fine, no drops. and load balancing is per destination.
I got fiber switched, connectors changed, disabled one the redundant legs to stop load balancing.
as you can see, I have pretty much been thru all what I can think off. so any suggestions are appreciated !
TIA
Sam
07-03-2009 11:07 AM
Hello Sam,
BGP Route Reflectors with many BGP peers are requested to perform a lot of process switching.
The problem could be at the buffering stage on both sides on two levels:
interface
system buffers
There are some old guidelines about them:
increase hold-queue to 1000 both in and out on physical interfaces.
tune system buffers
there is an auto tune option in modern IOS images.
There is an algorithm that handles the hold queue called spd
This has to be tuned too with commands like:
spd headroom 750
ip spd mode aggressive
ip spd queue max-threshold 999
ip spd queue min-threshold 998
SPD provides preferential treatment to ip routing protocol messages, IS-IS and CDP.
These guidelines were provided by Cisco for a service provider customer.
Hope to help
Giuseppe
07-03-2009 11:34 AM
Thanks Giuseppe !
I have indeed increased Hold Q earlier to 4096 IN/OUT in vain.
Since other clients are fine, I am starting to suspect recursive routing as a cause. The session is going up and down even after I increased timers between both Reflectors to let is settle.
Sam
07-04-2009 12:04 AM
I used a temporary work around by peering between Reflectors physical interfaces instead of loopbacks. This works with no flaps, which suggest issue is not related to QOS, timers, or router resources but a possible routing / tag switching error.
RRF1-PE1-COR01==COR02-PE2-RRF2
Tracing LSP between loopbacks looks good,keeping un mind there is no tag switching between RRF and PE.
I can still ping between RRFs Loopbacks ( extended) and using max MTU.
Any thoughts would be great help !
TIA
Sam
07-08-2009 11:27 PM
I ran a debug on the suspected bugged 7301 and could see that it is only sending keepalives once for a given neighbor...but regular ones to the other peers.
In short only one update to establish session, then skip sending next 2 until session tears down
Any thoughts ?
Sam
07-28-2010 04:18 AM
Hi All,
Having the same problem.
Did anyone has manage to solve this issue?
10x
Eyal
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide