BGP flapping with MTU mismatch

nassef2010 · ‎05-25-2024

Hello Everyone,

This is a typical issue but I want more details, in many cases when BGP flaps every 180 secs "Hold timer", when we adjust MTU the same at both sides, it works fine. But how could MTU affect the keepalive messages which are only 19 bytes, any Interface MTU could handle it. What is the relation between Flapping and MTU?

We assume here that flapping is caused by hold timer expiration due to missing of keepalive messages or that could be caused by something else that is related to MTU?

Thank you.

MHM Cisco World · ‎05-25-2024

you are correct

the Keepalive is small and pass

but the update is not pass and that make issue and that how BGP detect low MTU in path

it not keepalive it update

MHM

nassef2010 · ‎05-25-2024

Thank you for your reply.

But how UPDATE message is related to hold timer? I understand that might cause missing of updates at any peers as the update message wasn't received but why hold timer expires as long as the keepalive message passed?

MHM Cisco World · ‎05-25-2024

Some router send keepalive with update to other peers' this big packet is large than mtu then the peer drop Update and keepalive and hence yoh get this log error.

You can more check by config bgp neighbor only no redistrubte and no network command' ypu will see the bgp up and stable and after you add redistrubte and network ypu will see same log error appear again

MHM

Harold Ritter · ‎05-25-2024

Hi @MHM Cisco World ,

> but the update is not pass and that make issue and that how BGP detect low > MTU in path

What you are referring to is the path mtu discovery (PMTUD).

When an intermediate router receives a IP packet too large to be forwarded through the egress interface and the don't fragment bit is set in the IP header, it sends an ICMP packet too big back to the source. The BGP process uses these ICMP packet too big to reduce the TCP maximum segment, which fixes the issue of packets being dropped in transit because of their size.

The issue the OP is referring to normally happens when PMTUD is disabled.

Regards,

Regards,
Harold Ritter, CCIE #4168 (EI, SP)

Harold Ritter · ‎05-25-2024

Hi @nassef2010 , @MHM Cisco World ,

Even if the keep alive messages are not sent along with the update messages, they will arrive at destination out of order and will be queued. After the hold timer expires, the session will be reset.

Regards,

Regards,
Harold Ritter, CCIE #4168 (EI, SP)

MHM Cisco World · ‎05-25-2024

The keepalive is small and as I mention if he established only bgp without update (without network or redistrubte) the bgp will be up and stable.

The mtu effect keepalive (small) when it send with update.

How out of order happened here???

MHM

Harold Ritter · ‎05-25-2024

Hi @MHM Cisco World ,

> The keepalive is small and as I mention if he established only bgp without update > (without network or redistrubte) the bgp will be up and stable.

Agreed. The issue happens when you have large packets being dropped in transit because of their size and PMTUD is disabled.

> The mtu effect keepalive (small) when it send with update.

Correct, but keepalives will not always be sent with updates, but will fail anyway if the BGP updates are being dropped, as the keepalives will arrived at destination out of order from a TCP point of view.

> How out of order happened here???

Yes, the keepalives and updates are part of the same TCP session. The keepalives sent after the updates will have a higher TCP sequence number and if the updates are being dropped in transit, the receiving router considers the keepalives as out of order, queues them and waits for the missing segments to arrive before processing the keepalives. The missing segments will never arrive and the hold timer expires.

Regards,

Regards,
Harold Ritter, CCIE #4168 (EI, SP)

MHM Cisco World · ‎05-25-2024

so you meaning if one keepalive between two peer is missing the BGP session is dead ? what if other issue happened not MTU then for single keepalive the bgp is flapping ?? the peer dont care the order of tcp of keepalive it dummy message, and as I know the keepalive count peer down when missing three keepalive
NO I dont think so

R1(1500)-R2(1000)

the R1 have many prefix for it divided the update into series of packet with this packet he add keepalive
the R2 will drop udpate+keepalive, three times this happened so R2 declare that the R1 is dead.

so let him check without update.

MHM

Harold Ritter · ‎05-25-2024

Hi @MHM Cisco World ,

> so you meaning if one keepalive between two peer is missing the BGP session is > dead ?

No, one keepalive is not sufficient. You will need to wait for hold timer to expire for the BGP session to be reset.

> the peer dont care the order of tcp of keepalive

Yes, it certainly does. When the peer receives an out of order TCP segment, it queues it and waits for the missing segment(s) to be received before processing the out of order TCP segment. If the missing TCP segments are received, the peer processes them and it does not cause any issue.

In the case of the BGP updates larger than the path MTU, it is an issue because they never get to destination and the out of order keepalives are never processed, leading to the hold timer expiring.

Here's what you can see on the receiving router using the "deb ip tcp packet":

R2#deb ip tcp packet

*May 25 17:17:35.108: %BGP-5-ADJCHANGE: neighbor 192.168.12.1 Up
...
*May 25 17:17:44.293: TCP0: seq 2344113555 out-of-order, 19 bytes in save queue
...
*May 25 17:17:54.526: TCP0: seq 2344113574 out-of-order, 38 bytes in save queue
...
*May 25 17:18:02.717: TCP0: seq 2344113593 out-of-order, 57 bytes in save queue

*May 25 17:18:06.064: %BGP-3-NOTIFICATION: sent to neighbor 192.168.12.1 4/0 (hold time expired) 0 bytes

*May 25 17:18:06.067: %BGP-5-NBR_RESET: Neighbor 192.168.12.1 reset (BGP Notification sent)

*May 25 17:18:06.069: %BGP-5-ADJCHANGE: neighbor 192.168.12.1 Down BGP Notification sent
*May 25 17:18:06.070: %BGP_SESSION-5-ADJCHANGE: neighbor 192.168.12.1 IPv4 Unicast topology base removed from session BGP Notification sent

You can clearly see the 3 keepalive messages (19 bytes) being queued by TCP and then the BGP session going down after 3 missed keepalives (timers configured as timer bgp 10 30).

> the R2 will drop udpate+keepalive, three times this happened so R2 declare that > the R1 is dead.

This is another case where it happens. I mean when the keepalive is being delivered in the same TCP segment as the update. If this TCP segment gets dropped in transit, this keepalive is obviously never received.

You can see that in both cases the hold timer will eventually expire and the BGP session will be reset.

> so let him check without update.

This will obviously work, as the issue is normally cause by large BGP update messages.

Regards,

Regards,
Harold Ritter, CCIE #4168 (EI, SP)

MHM Cisco World · ‎05-25-2024

Friend' the keepalive is same size' how the size is different in log relate to keepalive?

*May 25 17:17:44.293: TCP0: seq 2344113555 out-of-order, 19 bytes in save queue
...
*May 25 17:17:54.526: TCP0: seq 2344113574 out-of-order, 38 bytes in save queue
...
*May 25 17:18:02.717: TCP0: seq 2344113593 out-of-order, 57 bytes in save queue

As I understand you mention that

R1 send keepalive1 and it drop and then send keepalive 2 which not drop and R2 assume it out of order? If that what ypu meaning it sure NO' if it receive keepalive2' then R2 know that R1 is alive

Out or order happened when there is multi path with different BW.

MHM

Harold Ritter · ‎05-25-2024

Hi @MHM Cisco World ,

> how the size is different in log relate to keepalive

It is not different sizes. What you see in the log is the queue size. After one keepalive queued, the queue size is 19 bytes, after 2 it is 38 bytes and after 3 it 57 bytes.

> if it receive keepalive2' then R2 know that R1 is alive

The keepalives are received by the destination router TCP component. The TCP component considers them as out of order, queues them and does not pass them to BGP. That is why the hold timer expires.

> Out or order happened when there is multi path with different BW.

Not necessarily. In this case there is only one path, but certain packets are dropped (large BGP update) and others not (small BGP keepalives).

Regards,

Regards,
Harold Ritter, CCIE #4168 (EI, SP)

MHM Cisco World · ‎05-25-2024

This lab ? How much you config mtu to make one peer send packet with less than 20 bytes?

If it lab make one side 1000 and other 1500 and share result.

Sure 20 bytes will make issue for keepalive.

MHM

Harold Ritter · ‎05-25-2024

Hi @MHM Cisco World ,

I think you are misunderstanding my explanations. The keepalives can reach the destination. They are simply queued by the receiver, because they are out of order from a TCP standpoint and the fact that they are considered as out of order is due to the fact that the update messages are being dropped in transit.

Can you please reread my previous explanations?

Regards,

Regards,
Harold Ritter, CCIE #4168 (EI, SP)

MHM Cisco World · ‎05-25-2024

I will share lab and some debug tomorrow

Thanks

MHM