Solved: BGP Keeps on Dropping...

arnoldsotis110882 · ‎01-24-2012

Error:

Jan 23 02:00:21.003: %BGP-3-BGP_NO_REMOTE_READ: x.x.x.x connection timed out - has not accepted a message from us for 20000ms (hold time), 0 messages pending transmition.

Jan 23 02:00:21.003: %BGP-3-NOTIFICATION: sent to neighbor x.x.x.x passive 4/0 (hold time expired) 0 bytes

Jan 23 02:00:21.003: %BGP_SESSION-5-ADJCHANGE: neighbor x.x.x.x IPv4 Unicast topology base removed from session BGP Notification sent

i already change the hold down timer and keepalive timer to lower value but it won't change. bgp still dropping frequently

added policy map config but still having the same result.

policy-map parent

class class-default

shape average 3300000 103000 0

can anyone help me figuring out waht is the exat issue?

Thank you

Jose Jara · ‎06-12-2013

Firuz,

as the ping with 1500 bytes works, and the mss is 536 bytes, this looks like that one of the peers has ip tcp mss 536 configured. I would removed that, and then clear the BGP session. Then, the BGP session should have a mss of 1460 bytes.

Best Regards,

Jose.

View solution in original post

Peter Paluch · ‎01-24-2012

Hi Arnold,

It seems as if the BGP communication with the flapping peer was somehow unreliable, lossy or blocked by some kind of firewall with short timeout periods. The message basically tells you that you have sent your BGP peer a message but it did not respond in proper time (in this case, 20 seconds). That can be caused by the fact that it either

did not receive your message, or
did not have the CPU power to process your message and send you a response, or
sent you a response but the response did not make it back to you

Each of these possibilities has to be evaluated individually. However, without knowing more about the network between you and the flapping BGP peer, it is hard to give any guidance.

The policy-map you have created - what is it good for? Where did you apply it? If it is not required by your other QoS settings, I strongly suggest removing any QoS related additions you have added during the examination of this issue.

Best regards,

Peter

arnoldsotis110882 · ‎01-24-2012

Thank you Peter,

Yeah, i already removed the QoS settings but still i have the same issue. I also rebooted the router but still having the same result.

Is there any factor that i need to check?

Peter Paluch · ‎01-24-2012

Arnold,

Please try to focus on the three areas I have pointed out. They simply have to be carefully inspected. Hard to tell anything more at this point - we don't know anything about your network at all.

What messages does the opposite BGP router print out? Perhaps they contain some interesting debugging info.

Best regards,

Peter

JohnTylerPearce · ‎01-24-2012

I agree with Peter, we will need to know more about your network. Is this router the only router with a BGP connect? Is it connected directly to your ISP? Also, are you running an IGP throughout your network or relying on static routing?

arnoldsotis110882 · ‎01-24-2012

Hi John,

This is an IBGP peering and it is directly connected, the peer/opposite router has EBGP peering but it works fine.

no IGP are running in the network.

They have the same print out for the opposite BGP router.

Thank you

JohnTylerPearce · ‎01-24-2012

So what I take from this, is that, you have two routers directly connected A and B via an iBGP peer, and one of the routers A and B has an eBGP connection to C, is this correct? And the problem is with the iBGP peer?

arnoldsotis110882 · ‎01-24-2012

Hi John,

Yes, that is correct! My problem is my iBGP peer...

Thank you.

JohnTylerPearce · ‎01-25-2012

Peter gave some pretty good ideas. It sounds like there are problems with keepalives getting through. Can you check the log history of those two routers? It might tell you what's going on. That's a good place to start.

arnoldsotis110882 · ‎01-25-2012

Hi John,

It's working now! First, it didn't work by changing the hold down and keepalive timer, rebooted the router and the same by adding policy map.

What i've done to make it work is that I physically transferred the connection between router A and B.

Is there any relationship between the BGP peering (BGP Flapping) and the port itself?

How much are the data traffic need to affect or to drop the BGP hello packet?

If traffic is huge, is it all right to lower down the value of hold down and keepalive time?

Or what is the effect if the value of hold down and keepalive time is lower than the default?

Thank you so much!

JohnTylerPearce · ‎01-25-2012

Well the whole purpose of keepalive are for reachability concerns. Think of them as a heartbeat. If Router A stops hearing heartbeats from Router B, it considered Router B to be dead. Just make sure, when you tune your timers, not to make them so short, that if there is some lag or a little delay, that the BGP peering doesn't go down.

arnoldsotis110882 · ‎01-26-2012

Thank you John/Peter... Appreciate your help.

Arnold

rsimoni · ‎01-26-2012

hi,

check for mtu related issue on the link between the 2 router.

ping the neighbor at 1500 (it's ethernet right?) with df-bit set.

if your ping does not go through you might have identified the issue.

what happens in this cases is that as long as only little updates or keepalives are sent between peers no issue is exposed as the message being sent are smaller than max mtu.

when big bgp updates are sent the router try to push multiple updates in the same packet until the negotiated tcp mss window is full. a bgp update is also counted as a keepalive, which is no small keepalive is being sent while big updates are going through.

If due to mtu issues the big updates are dropped along the way the peering will go down for hold time expiration as one of the 2 routers (or both) are not receving keepalives (or updates which are considered keepalives).

this is what happens when icmp packets are filtered along the way.

you can also quickly checked the negotiated mss in the bgp peering of both routers and check if the value is coherent with actual link mtu.

example:

iBGP is running between R1 and R4

R1#sh ip bgp ne 10.100.1.4 | inc max data segment

Datagrams (max data segment is 1460 bytes):

R4#sh ip bgp ne 10.100.1.1 | inc max data segment

Datagrams (max data segment is 1460 bytes):

Firuz Azimov · ‎06-12-2013

Hi Riccardo.

From your explanation, I have the same problem

I have eBGP but when I advertise all my routing table to remote peer. My neighborship is broken

logs :

Jun 11 22:48:04: %BGP-3-BGP_NO_REMOTE_READ: 192.168.4.58 connection timed out - has not accepted a message from us for 180000ms (hold time), 1 messages pending transmition.

Jun 11 22:48:04: %BGP-5-ADJCHANGE: neighbor 192.168.4.58 Down BGP Notification sent

But when I filter some routes its working fine

I think you are right this this MTU issue

Router#show ip bgp neighbors 192.168.4.58 | i max data segment

Datagrams (max data segment is 536 bytes):

The same output from remote peer

But Ping goes successfull

Router#ping 192.168.4.58 source 192.168.4.57 df-bit size 1500

Type escape sequence to abort.

Sending 5, 1500-byte ICMP Echos to 192.168.4.58, timeout is 2 seconds:

Packet sent with a source address of 192.168.4.57

Packet sent with the DF bit set

!!!!!

Success rate is 100 percent (5/5), round-trip min/avg/max = 8/8/12 ms

How I can resolve this problem?

Thanks for help

Regards

Jose Jara · ‎06-12-2013

Firuz,

as the ping with 1500 bytes works, and the mss is 536 bytes, this looks like that one of the peers has ip tcp mss 536 configured. I would removed that, and then clear the BGP session. Then, the BGP session should have a mss of 1460 bytes.

Best Regards,

Jose.