cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3504
Views
35
Helpful
25
Replies

L2VPN/EVPN E-BGP Peers Flapping

LewisD1
Level 1
Level 1

Hello, 

 

I'm having an issue with BGP sessions expiring. This issues seems to resolve itself after hours/days then works solidly for weeks before the issue then re-occurs and i'm drawing up a blank. All the iBGP sessions work without issues just the E-BGP sessions causing me grief. Im using BGP multihop for the Peering with dual 10GB Links and a multiarea OSPF underlay.

 

This is what i'm seeing in the logs

 

22 May 24 12:35:51.876119 bgp: [23668] (default) ADJ: 122.122.122.122 keepalive timer fired
2022 May 24 12:35:51.876144 bgp: [23668] (default) ADJ: 122.122.122.122 keepalive timer fired for peer
2022 May 24 12:35:51.876155 bgp: [23668] (default) ADJ: 122.122.122.122 sending KEEPALIVE
2022 May 24 12:35:51.876639 bgp: [23668] (default) ADJ: 122.122.122.122 next keepalive expiry due in 00:00:59

 

2022 May 24 12:32:22.329490 bgp: [23668] (default) ADJ: 121.121.121.121 keepalive timer fired
2022 May 24 12:32:22.329520 bgp: [23668] (default) ADJ: 121.121.121.121 keepalive timer fired for peer
2022 May 24 12:32:22.329537 bgp: [23668] (default) ADJ: 121.121.121.121 sending KEEPALIVE
2022 May 24 12:32:22.330033 bgp: [23668] (default) ADJ: 121.121.121.121 next keepalive expiry due in 00:00:59

 

On the other side i see this.

 

2022 May 24 12:34:18.077881 bgp: [26672] (default) ADJ: 221.221.221.221 keepalive timer fired
2022 May 24 12:34:18.077907 bgp: [26672] (default) ADJ: 221.221.221.221 keepalive timer fired for peer
2022 May 24 12:34:18.077918 bgp: [26672] (default) ADJ: 221.221.221.221 sending KEEPALIVE
2022 May 24 12:34:18.078387 bgp: [26672] (default) ADJ: 221.221.221.221 next keepalive expiry due in 00:00:59
2022 May 24 12:34:18.086460 bgp: [26672] (default) ADJ: Peer 221.221.221.221 has pending data on socket during recv, extending expiry timer
2022 May 24 12:34:18.086940 bgp: [26672] (default) ADJ: 221.221.221.221 KEEPALIVE rcvd

 

2022 May 24 12:34:03.584558 bgp: [26672] (default) ADJ: 222.222.222.222 keepalive timer fired
2022 May 24 12:34:03.584588 bgp: [26672] (default) ADJ: 222.222.222.222 keepalive timer fired for peer
2022 May 24 12:34:03.584598 bgp: [26672] (default) ADJ: 222.222.222.222 sending KEEPALIVE
2022 May 24 12:34:03.585085 bgp: [26672] (default) ADJ: 222.222.222.222 next keepalive expiry due in 00:00:59
2022 May 24 12:34:03.587219 bgp: [26672] (default) ADJ: Peer 222.222.222.222 has pending data on socket during recv, extending expiry timer
2022 May 24 12:34:03.587696 bgp: [26672] (default) ADJ: 222.222.222.222 KEEPALIVE rcvd

 

So keep alives are being sent in both directions but only received on one side. Everytime the other side sends one it get the error, has pending data on socket during recv, extending expiry timer. This causing the timers to expire and it to start all over again. 

 

Site 1 Relationships. 

show bgp l2vpn evpn summary
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 121.121.121.121, local AS number 65001
BGP table version is 879001, L2VPN EVPN config peers 4, capable peers 4
356 network entries and 505 paths using 104744 bytes of memory
BGP attribute entries [77/13244], BGP AS path entries [1/6]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
111.111.111.111 4 65001 80151 67103 879001 0 0 6w5d 118
112.112.112.112 4 65001 69159 64569 879001 0 0 6w5d 89
221.221.221.221 4 65002 130290 71161 879001 0 0 00:00:50 149
222.222.222.222 4 65002 130307 71155 879001 0 0 00:00:05 149

 

Site 2

show bgp l2vpn evpn summary
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 221.221.221.221, local AS number 65002
BGP table version is 52220, L2VPN EVPN config peers 4, capable peers 4
149 network entries and 149 paths using 36356 bytes of memory
BGP attribute entries [36/6192], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
121.121.121.121 4 65001 48978 46551 52220 0 0 00:01:34 0
122.122.122.122 4 65001 48986 46548 52220 0 0 00:01:16 0
211.211.211.211 4 65002 48500 40812 52220 0 0 4w1d 92
212.212.212.212 4 65002 42174 39843 52220 0 0 4w1d 57

 

show run bgp

router bgp 65002
router-id 221.221.221.221
log-neighbor-changes
address-family ipv4 unicast
address-family l2vpn evpn
retain route-target all
template peer INTER-BGP-PEER
remote-as 65001
update-source loopback0
ebgp-multihop 10
address-family ipv4 unicast
send-community
send-community extended
address-family l2vpn evpn
send-community
send-community extended
route-map NH-Unchanged out
template peer INTRA-BGP-PEER
remote-as 65002
update-source loopback0
address-family ipv4 unicast
send-community
send-community extended
route-reflector-client
address-family l2vpn evpn
send-community
send-community extended
route-reflector-client
neighbor 121.121.121.121
inherit peer INTER-BGP-PEER
remote-as 65001
neighbor 122.122.122.122
inherit peer INTER-BGP-PEER
remote-as 65001
neighbor 211.211.211.211
inherit peer INTRA-BGP-PEER
neighbor 212.212.212.212
inherit peer INTRA-BGP-PEER

 

show run bgp

router bgp 65001
router-id 121.121.121.121
log-neighbor-changes
address-family ipv4 unicast
address-family l2vpn evpn
retain route-target all
template peer INTER-BGP-PEER
remote-as 65002
update-source loopback0
ebgp-multihop 10
address-family ipv4 unicast
send-community
send-community extended
address-family l2vpn evpn
send-community
send-community extended
route-map NH-Unchanged out
template peer INTRA-BGP-PEER
remote-as 65001
update-source loopback0
address-family ipv4 unicast
send-community
send-community extended
route-reflector-client
address-family l2vpn evpn
send-community
send-community extended
route-reflector-client
neighbor 111.111.111.111
inherit peer INTRA-BGP-PEER
neighbor 112.112.112.112
inherit peer INTRA-BGP-PEER
neighbor 221.221.221.221
inherit peer INTER-BGP-PEER
remote-as 65002
neighbor 222.222.222.222
inherit peer INTER-BGP-PEER
remote-as 65002

 

 

Software Version 

BIOS: version 07.69

NXOS: version 9.3(8)

 

Any help would be appreciated. Like i say it will just fix its self without any changes but once its down i cant seem to get the peers to form. 

 

Thanks

1 Accepted Solution

Accepted Solutions

both peer mismatch MTU and this make huge CPU utilize and make router stuck,
so please config MSS for tcp
and for spine and leaf, first try config TCP under the LO (which you use as update source), if not success 
then you need to config ip tcp mss in underlaying interface connect leaf to spine (there is no direct connect between leafs).

check again with 

show sockets connection tcp foreign

to see that new mtg value is accept or not.



View solution in original post

25 Replies 25

show bgp summary no OutQ so both peer send receive keep alive but the message is drop in away 

show tcp brief
check the router that not receive the keep alive is using known 179 TCP port not use unknown port.

this give us a sign that there is FW in away that drop TCP connection and since the KEEPalive don't have SYN flag the FW will drop it "drop in one way"
then after both Peer find that TCP is down they establish new TCP session with new SYN flag TCP packet and this allow to pass from FW. 

so depend on your topology check this point.
Note:- if you have two FW and the IGP is shift traffic from one FW to other you will also face same issue.

 

Hi, 

 

There is no firewall between these Nexus switches these are x-Connects between DC Racks.

 


Total peers 4, established peers 4
ASN 65002
VRF default, local ASN 65002
peers 4, established peers 4, local router-id 221.221.221.221
State: I-Idle, A-Active, O-Open, E-Established, C-Closing, S-Shutdown

Neighbor ASN Flaps LastUpDn|LastRead|LastWrit St Port(L/R) Notif(S/R)
121.121.121.121 65001 1136 00:02:12|00:02:12|00:00:11 E 179/31544 1136/0
122.122.122.122 65001 1136 00:02:00|00:02:00|00:00:59 E 179/45164 1136/0
211.211.211.211 65002 0 4w1d |00:00:51|00:00:02 E 179/31192 0/0
212.212.212.212 65002 0 4w1d |00:00:02|00:00:51 E 179/34865 0/0

 

Can see it using port 179. 

 

Thanks

Lewis

 

LewisD1
Level 1
Level 1

Any ideas. Really scratching my head on this one.

I not finish the analysis the issue but 
flapping BGP Peer may cause from 
1-Bad BGP update 

2-MTU mismatch 
3- High CPU
4-Improper Control-Plane Policing 

5-Hold Timer Expired 


according to info. you give me I will send you roadmap to how check which of five is causing the BGP flap.

but until that time you can check 
High CPU 
Control-Plane Policing 
MTU mismatch

 

for MTU mismatch this easy 
ping BGP peer using MTU with DF bit set, see if ping is success 

Hello MHM, 

 

The hold timer is expiring due to it not being able to received the messages. This is the error i am receiving. 

Peer .... has pending data on socket during recv, extending expiry timer

I am currently using the default timers but have tried to adjust the timers as well. 

 

We currently have Copp profile set to strict on the 4 switches. 

 

Its not an MTU mismatch because the relationship forms ok just failed to keep it up. I have done a ping test just to be sure. 

ping 222.222.222.222 packet-size 9000
PING 222.222.222.222 (222.222.222.222): 9000 data bytes
9008 bytes from 222.222.222.222: icmp_seq=0 ttl=252 time=3.918 ms
9008 bytes from 222.222.222.222: icmp_seq=1 ttl=252 time=3.395 ms
9008 bytes from 222.222.222.222: icmp_seq=2 ttl=252 time=3.346 ms
9008 bytes from 222.222.222.222: icmp_seq=3 ttl=252 time=3.357 ms
9008 bytes from 222.222.222.222: icmp_seq=4 ttl=252 time=3.452 ms

Thanks

Lewis

please do one more ping test with 
use source IP the BGP Peer IP.

Done another one

 

# ping 2.2.2.3 source 121.121.121.121 packet-size 9000
PING 2.2.2.3 (2.2.2.3) from 121.121.121.121: 9000 data bytes
9008 bytes from 2.2.2.3: icmp_seq=0 ttl=252 time=3.873 ms
9008 bytes from 2.2.2.3: icmp_seq=1 ttl=252 time=3.468 ms
9008 bytes from 2.2.2.3: icmp_seq=2 ttl=252 time=3.379 ms
9008 bytes from 2.2.2.3: icmp_seq=3 ttl=252 time=3.379 ms
9008 bytes from 2.2.2.3: icmp_seq=4 ttl=252 time=3.44 ms

time-1 do 
show bgp l2vpn evpn summary
for both Peer
time-2 do 
show bgp l2vpn evpn summary
for both peer 
please if you can share here

show bgp l2vpn evpn summary
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 121.121.121.121, local AS number 65001
BGP table version is 786094, L2VPN EVPN config peers 4, capable peers 4
348 network entries and 491 paths using 102072 bytes of memory
BGP attribute entries [73/12556], BGP AS path entries [1/6]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
111.111.111.111 4 65001 5243 4274 786094 0 0 2d22h 117
112.112.112.112 4 65001 4275 4113 786094 0 0 2d22h 88
221.221.221.221 4 65002 61058 9410 786094 0 0 00:01:41 143
222.222.222.222 4 65002 3759 646 786094 0 0 00:00:01 143

---------------------------

show bgp l2vpn evpn summary
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 122.122.122.122, local AS number 65001
BGP table version is 1686043, L2VPN EVPN config peers 4, capable peers 4
348 network entries and 491 paths using 102072 bytes of memory
BGP attribute entries [73/12556], BGP AS path entries [1/6]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
111.111.111.111 4 65001 85526 71264 1686043 0 0 7w1d 117
112.112.112.112 4 65001 73554 68564 1686043 0 0 7w1d 88
221.221.221.221 4 65002 193260 80898 1686043 0 0 00:02:44 143
222.222.222.222 4 65002 192901 80862 1686043 0 0 00:01:36 143

----------------------------

show bgp l2vpn evpn summary
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 221.221.221.221, local AS number 65002
BGP table version is 377, L2VPN EVPN config peers 4, capable peers 4
143 network entries and 143 paths using 34892 bytes of memory
BGP attribute entries [31/5332], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
121.121.121.121 4 65001 142 534 377 0 0 00:00:34 0
122.122.122.122 4 65001 145 535 377 0 0 00:00:16 0
211.211.211.211 4 65002 253 218 377 0 0 03:23:49 86
212.212.212.212 4 65002 232 210 377 0 0 03:21:03 57

---------------------------

show bgp l2vpn evpn summary
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 2.2.2.3, local AS number 65002
BGP table version is 407, L2VPN EVPN config peers 4, capable peers 4
143 network entries and 229 paths using 45212 bytes of memory
BGP attribute entries [50/8600], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [1/8]

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
121.121.121.121 4 65001 128 517 407 0 0 00:02:41 0
122.122.122.122 4 65001 140 517 407 0 0 00:02:57 0
211.211.211.211 4 65002 256 216 407 0 0 03:23:37 86
212.212.212.212 4 65002 267 207 407 0 0 03:21:37 143

 

show bgp l2vpn even summary

...

222.222.222.222 4 65002 3759 646 786094 0 0 00:00:01 143

---------------------------

show bgp l2vpn evpn summary
...
222.222.222.222 4 65002 192901 80862 1686043 0 0 00:01:36 143

where other Peer see send message with around 50 message, there is MTU mismatch, one is send with MTU 9000 other use default 1500, that why we see many message receive between two show even so the sender show around 50 message.
i.e. the 9000 message is fragment into many 1500 message.

commend reference 
ping
{ dest-address | hostname } [ count { number | unlimited }] [ df-bit ] [ interval seconds ] [ packet-size bytes ] [source src-address ] [ timeout seconds ] [ vrf { vrf-name | default | management }]

 

# ping 2.2.2.3 source 121.121.121.121 packet-size 8500 df-bit<- without this bit set the packet is fragment, we must sure that the ALL PATH have MTU 9000

When im using the df-bit im still able to ping. 

 

ping 2.2.2.3 source 121.121.121.121 packet-size 8500 df-bit
PING 2.2.2.3 (2.2.2.3) from 121.121.121.121: 8500 data bytes
8508 bytes from 2.2.2.3: icmp_seq=0 ttl=252 time=3.685 ms
8508 bytes from 2.2.2.3: icmp_seq=1 ttl=252 time=3.389 ms
8508 bytes from 2.2.2.3: icmp_seq=2 ttl=252 time=3.343 ms
8508 bytes from 2.2.2.3: icmp_seq=3 ttl=252 time=3.714 ms
8508 bytes from 2.2.2.3: icmp_seq=4 ttl=252 time=3.305 ms

How can i confirm what MTU it is using. 

 

For IOS and i think it same for Nexus,

Show ip bgp neighbor x.x.x.x

Segment show mtu size use to this neighbor 

Nothing comes back when i add include segment

 

show ip bgp neighbor 121.121.121.121
BGP neighbor is 121.121.121.121, remote AS 65001, ebgp link, Peer index 3
Inherits peer configuration from peer-template INTER-BGP-PEER
BGP version 4, remote router ID 121.121.121.121
Neighbor previous state = OpenConfirm
BGP state = Established, up for 00:01:03
Neighbor vrf: default
Using loopback3 as update source for this peer
Enable logging neighbor events
External BGP peer might be up to 10 hops away
Last read 00:01:03, hold time = 180, keepalive interval is 60 seconds
Last written 00:00:02, keepalive timer expiry due 00:00:57
Received 188 messages, 0 notifications, 0 bytes in queue
Sent 759 messages, 93 notifications, 0(0) bytes in queue
Enhanced error processing: On
0 discarded attributes
Connections established 94, dropped 93
Last reset by us 00:01:14, due to holdtimer expired error
Last error length sent: 0
Reset error value sent: 0
Reset error sent major: 4 minor: 0
Notification data sent:
Last reset by peer never, due to No error
Last error length received: 0
Reset error value received 0
Reset error received major: 0 minor: 0
Notification data received:

Neighbor capabilities:
Dynamic capability: advertised (mp, refresh, gr) received (mp, refresh, gr)
Dynamic capability (old): advertised received
Route refresh capability (new): advertised received
Route refresh capability (old): advertised received
4-Byte AS capability: advertised received
Address family IPv4 Unicast: advertised received
Address family L2VPN EVPN: advertised received
Graceful Restart capability: advertised received

Graceful Restart Parameters:
Address families advertised to peer:
IPv4 Unicast L2VPN EVPN
Address families received from peer:
IPv4 Unicast L2VPN EVPN
Forwarding state preserved by peer for:
IPv4 Unicast L2VPN EVPN
Restart time advertised to peer: 120 seconds
Stale time for routes advertised by peer: 300 seconds
Restart time advertised by peer: 120 seconds
Extended Next Hop Encoding Capability: advertised received
Receive IPv6 next hop encoding Capability for AF:
IPv4 Unicast VPNv4 Unicast

Message statistics:
Sent Rcvd
Opens: 95 94
Notifications: 93 0
Updates: 3309 0
Keepalives: 375 94
Route Refresh: 0 0
Capability: 2 0
Total: 759 188
Total bytes: 724366 0
Bytes in queue: 0 0

For address family: IPv4 Unicast
BGP table version 187, neighbor version 187
0 accepted prefixes (0 paths), consuming 0 bytes of memory
0 received prefixes treated as withdrawn
0 sent prefixes (0 paths)
Community attribute sent to this neighbor
Extended community attribute sent to this neighbor
Last End-of-RIB sent 00:00:01 after session start
First convergence 00:00:01 after session start with 0 routes sent

For address family: L2VPN EVPN
BGP table version 537, neighbor version 537
0 accepted prefixes (0 paths), consuming 0 bytes of memory
0 received prefixes treated as withdrawn
146 sent prefixes (146 paths)
Community attribute sent to this neighbor
Extended community attribute sent to this neighbor
Allow my ASN 3 times
Advertise GW IP is enabled
Outbound route-map configured is NH-Unchanged, handle obtained
Last End-of-RIB sent 00:00:01 after session start
First convergence 00:00:01 after session start with 146 routes sent

Local host: 2.2.2.3, Local port: 179
Foreign host: 121.121.121.121, Foreign port: 42575
fd = 74