BGP flap - OutQ

cisco_lad2004 · ‎07-03-2009

Dear all

am dealing with a a BGP flap issue, and looked at pretty much all what I can think of.

outQ is not able to drop to zero, this is on a Route reflector which has 2 legs to a distribution switch. IGP used is IS-IS, platform is 7301.

Since it is a Reflector, it has many BGP sessions. all sessions are behaving as expected....except the one to the primary reflector.

I checked for L2 issues, MTU issues. I have no QOS, nor do I see need for it. since its a reflector no much traffic carried on Ge ports.

extended pings between loopbacks works fine, no drops. and load balancing is per destination.

I got fiber switched, connectors changed, disabled one the redundant legs to stop load balancing.

as you can see, I have pretty much been thru all what I can think off. so any suggestions are appreciated !

TIA

Sam

Giuseppe Larosa · ‎07-03-2009

Hello Sam,

BGP Route Reflectors with many BGP peers are requested to perform a lot of process switching.

The problem could be at the buffering stage on both sides on two levels:

interface

system buffers

There are some old guidelines about them:

increase hold-queue to 1000 both in and out on physical interfaces.

tune system buffers

there is an auto tune option in modern IOS images.

There is an algorithm that handles the hold queue called spd

This has to be tuned too with commands like:

spd headroom 750

ip spd mode aggressive

ip spd queue max-threshold 999

ip spd queue min-threshold 998

SPD provides preferential treatment to ip routing protocol messages, IS-IS and CDP.

These guidelines were provided by Cisco for a service provider customer.

Hope to help

Giuseppe

cisco_lad2004 · ‎07-03-2009

Thanks Giuseppe !

I have indeed increased Hold Q earlier to 4096 IN/OUT in vain.

Since other clients are fine, I am starting to suspect recursive routing as a cause. The session is going up and down even after I increased timers between both Reflectors to let is settle.

Sam

cisco_lad2004 · ‎07-04-2009

I used a temporary work around by peering between Reflectors physical interfaces instead of loopbacks. This works with no flaps, which suggest issue is not related to QOS, timers, or router resources but a possible routing / tag switching error.

RRF1-PE1-COR01==COR02-PE2-RRF2

Tracing LSP between loopbacks looks good,keeping un mind there is no tag switching between RRF and PE.

I can still ping between RRFs Loopbacks ( extended) and using max MTU.

Any thoughts would be great help !

TIA

Sam

cisco_lad2004 · ‎07-08-2009

I ran a debug on the suspected bugged 7301 and could see that it is only sending keepalives once for a given neighbor...but regular ones to the other peers.

In short only one update to establish session, then skip sending next 2 until session tears down

Any thoughts ?

Sam

Eyal Hezi · ‎07-28-2010

Hi All,

Having the same problem.

Did anyone has manage to solve this issue?

10x

Eyal