BGP Neighbor Flap !!!

arjunsarkar · ‎06-23-2009

Hello :

My customer is seeing frequent neighbor flap .

ADJCHANGE: neighbor x.x.x.x Down BGP Notification received

.Jun 2 14:07:56.705 BST: %BGP-5-ADJCHANGE: neighbor x.x.x.x Up

While troubleshooting the issue I have found that the BGP time is set in one

side 5 (keepalive ) 10 (Holdtime ) and

on another side 5 (Keepalive ) 15 (Holdtime). Also on other routers I see

the keepalive & holdtime are set 5 & 10-15 sec .

I have managed to reproduce the problem in GNS3 . And I see the following

============================

router bgp 1300

no synchronization

bgp log-neighbor-changes

timers bgp 5 10

neighbor 10.13.1.1 remote-as 1200

no auto-summary

R3#

*Mar 1 00:10:02.107: %BGP-3-NOTIFICATION: received from neighbor 10.13.1.1 4/0

(hold time expired) 0 bytes

*Mar 1 00:10:02.115: %BGP-5-ADJCHANGE: neighbor 10.13.1.1 Down BGP Notification

received

*Mar 1 00:10:26.315: %BGP-5-ADJCHANGE: neighbor 10.13.1.1 Up

R0#sh runn | sec bgp

router bgp 1200

no synchronization

bgp log-neighbor-changes

timers bgp 5 15

neighbor 10.12.1.2 remote-as 1200

neighbor 10.13.1.2 remote-as 1300

no auto-summary

R0#

*Mar 1 00:40:31.483: %BGP-5-ADJCHANGE: neighbor 10.13.1.2 Down BGP Notification

sent

*Mar 1 00:40:31.483: %BGP-3-NOTIFICATION: sent to neighbor 10.13.1.2 4/0 (hold

time expired) 0 bytes

*Mar 1 00:40:59.207: %BGP-5-ADJCHANGE: neighbor 10.13.1.2 Up

=================================

So can you please suggest whether the BGP timer is not set correctly and thats

the cause of BGP neighbor flap ?

Note : My customer uses 7206VXR router and the IOS version : 12.4(15)T8

Regards

Arjun

cisco_lad2004 · ‎06-23-2009

Arjun,

BGP timers need not to be the same. They are negotiated in early stage of session set up and would settle for lower values of 2 neighbors...so this is not the cause.

Check the intefaces for errors or resets, as this is a typical reason for TCP session to fail or flap.

HTH

Sam

arjunsarkar · ‎06-23-2009

Sam :

Thanks for the reply . I see there is one Serial interface resets about 2021 .

But I see the BGP Adj changes happen due to holddowin timers being expired . So If the BGP timer is not the cause of the problem then how can we go further ?

Regards

Arjun

Giuseppe Larosa · ‎06-23-2009

Hello Arjun,

using a ratio 3:1 between keepalive and hold-time is recommended so that missing two consecutive BGP keepalives is requested for the neighbor to send the BGP notification message.

With a ratio 2:1 missing one BGP keepalive can be enough to trigger the BGP notification.

Also check the platforms type and link usage: there are chances that a full used link without QoS protection of BGP packets can drop them in saturation.

For example some platforms have an hidden system queue for routing protocol (2600,3600 and ISR routers) traffic but other ones like C7500 and GSR haven't it.

I see you have 7206 VXR I would deploy a QoS policy to protect BGP packets on the link.

Hope to help

Giuseppe

arjunsarkar · ‎06-23-2009

Hello Giuseppe :

I do see one of the links has 2022 numbers of resets . Can also that be a problem of BGP Adg change ?

Regards

Arjun

Giuseppe Larosa · ‎06-23-2009

Hello Arijun,

check if the number of interface resets grows over time

if so of course is a sign of OSI Layer1 problems, however I would use holdtime 15 seconds on both sides as also suggested by Istvan for all the reasons we have explained.

you may need to contact your WAN provider if the link has troubles to perform loop tests.

see

http://www.cisco.com/en/US/docs/internetworking/troubleshooting/guide/tr1915.html

Later you can fix the BGP configuration.

Hope to help

Giuseppe

Istvan_Rabai · ‎06-23-2009

Hi Arjun,

With this configuration the following timers are negotiated:

keepalive: 5 sec

holdtime: 10 sec

This means the bgp neighbor relationship will be reset if 2 keepalives aren't received by the neighbor.

In this scenario, it can happen that one keepalive packet is lost and the next keepalive arrives a bit late, which causes the neighbor relationship to be reset.

In general it is suggested for the holdtime to be at least 3 times the keepalive timer, in this case 15 sec.

I would suggest to set the holdtime to 15 sec on both BGP routers, so the negotiated values will be 5 sec and 15 sec respectively.

EDIT:

I noticed Giuseppe also responded. His suggestion is great to protect bgp router traffic by configuring QoS.

Cheers:

Istvan