cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3531
Views
24
Helpful
48
Replies

EVPN VXLAN NVE Peers only showing up on one side and are inconsistent

shannonr
Level 1
Level 1

I am setting up EVPN-VXLAN as an overlay to provide L2 reachability.

I have 2 spines and 4 leaves. Spines are Cat 9500-16x and Leaves are a mix of Cat 9500-16x and Cat 9500-48y4c (2 of each).

Looking at the NVE peers on all the switches:

  • Spine1 peers with Leaf 1-2
  • Spine 2 peers with Leaf 1-2
  • Leaf 1 peers with Spine 1,2, Leaf 2
  • Leaf 2 peers with Spine 1,2, Leaf 1.
  • Leaf 3 thinks it peers with Spine 1-2, and Leaf 1-2
  • Leaf 4 thinks it peers with Spine1-2 and Leaf 1-2.

I'm not sure why leaves 1-2 peer with each other. And I'm not sure why leaves 3-4 think they are peered with everything (except with each other), when the other peer doesn't recognise the peering.

I also receive these errors in the logs on the 9500-16x switches:
*Jul 9 01:42:31.759: NVE-MGR-EI: L2FIB rvtep 100102:UNKNOWN cfg type 2 for BD 102
*Jul 9 01:42:31.759: NVE-MGR-EI ERROR: Invalid peer address for bd 102

These errors are definitely related as, as, if I disable the NVE interfaces on the 9500-48 switches (leaves 3-4) the errors stop.

Config is identical on all leaves.

  • Spine1 = 10.254.1.1 (NVE IP, source interface loopback 1)
  • Spine2 = 10.254.1.2
  • Leaf 1 (9500-16) = 10.254.1.3
  • Leaf 2 (9500-16) = 10.254.1.4
  • Leaf 3 (9500-48) = 10.254.1.5
  • Leaf 4 (9500-48) = 10.254.1.6

 

Output from Leaf 5 showing it has a peering to leaf 2:

RND-9500-48X_VTEP(config-if)#do sh nve peers | i 10.254.1.4
nve1 100100 L2CP 10.254.1.4 4 100100 UP N/A 00:03:49

Output from Leaf 2 showing no peering to leaf 5:

S4NR-9500-16X_VTEP#sh nve peers | i 10.254.1.5
S4NR-9500-16X_VTEP#

S4NR-9500-16X_VTEP#sh nve peers | i 10.254.1.
nve1 100100 L2CP 10.254.1.1 4 100100 UP N/A 23:49:47
nve1 100100 L2CP 10.254.1.2 4 100100 UP N/A 23:49:47
nve1 100100 L2CP 10.254.1.3 4 100100 UP N/A 23:49:47

Note: the loopbacks used by the NVE interfaces can ping each other so the underlay routing is working fine (OSPF).

NVE interface config (identical on all switches):

interface nve1
no ip address
source-interface Loopback1
host-reachability protocol bgp
end

What is going on?

I've not sure how to troubleshoot further.

Thanks!

48 Replies 48

Can You share the config ?

MHM

Hi,

I've attached here config for one of the spines as well as leaf 1 (9500-16x) and leaf 4 (9500-48). I have cut out the config that could not have any relevance for brevity to make reading a bit easier.

I think this might simply be a fundamental knowledge gap on my end of how this works. As a test I moved a traditional access switch that was connected to Spine 2 (this was for testing only), into Leaf 4 and now leaf 4 has two-way NVE peering with Spine 1 and Leaf 1, and Spine 2 is exhibiting similar behaviour to what I was seeing previously on Leaf 3-4. Also, Leaf 1 and Leaf 2 are now no longer peered either.

So, the issue seems to move around depending on what is plugged in where. This is a lab build so not every leaf switch has anything plugged in to bring up the VLANs associated with the VNIs.

On the underlay OSPF is up on all the physical links - every switch in the topology knows how to get to the loopback 1 interface (used for nve peering) on every other switch, and on the overlay BGP is established between all the leaves and the spines correctly.

I wonder if I simply don't understand how the peering gets established, and what triggers it? 

Hello @shannonr 

To make it easy please configure SPINE as BGP route-reflector.

neighbor x.x.x.x route-reflector-client

Also, NVE interface is not needed on SPINE.

Make sure all BGP peering on SPINE are configured with the route-refflector command. Check all BGP peering state. 

 

Best regards
.ı|ı.ı|ı. If This Helps, Please Rate .ı|ı.ı|ı.

Hi,

Thanks for the response!

The RR client is configured via the profile already unless I have mistaken something?

router bgp 65550
template peer-policy EVPN-VXLAN-PP
route-reflector-client
send-community both
exit-peer-policy
!
template peer-session EVPN-VXLAN-PS
remote-as 65550
update-source Loopback0
exit-peer-session
!
bgp log-neighbor-changes
neighbor 10.254.0.3 remote-as 65550
neighbor 10.254.0.3 inherit peer-session EVPN-VXLAN-PS
neighbor 10.254.0.3 update-source Loopback0
neighbor 10.254.0.4 remote-as 65550
neighbor 10.254.0.4 inherit peer-session EVPN-VXLAN-PS
neighbor 10.254.0.4 update-source Loopback0
neighbor 10.254.0.5 remote-as 65550
neighbor 10.254.0.5 inherit peer-session EVPN-VXLAN-PS
neighbor 10.254.0.5 update-source Loopback0
neighbor 10.254.0.6 remote-as 65550
neighbor 10.254.0.6 inherit peer-session EVPN-VXLAN-PS
neighbor 10.254.0.6 update-source Loopback0
!
address-family l2vpn evpn
neighbor 10.254.0.3 activate
neighbor 10.254.0.3 send-community both
neighbor 10.254.0.3 inherit peer-policy EVPN-VXLAN-PP
neighbor 10.254.0.4 activate
neighbor 10.254.0.4 send-community both
neighbor 10.254.0.4 inherit peer-policy EVPN-VXLAN-PP
neighbor 10.254.0.5 activate
neighbor 10.254.0.5 send-community both
neighbor 10.254.0.5 inherit peer-policy EVPN-VXLAN-PP
neighbor 10.254.0.6 activate
neighbor 10.254.0.6 send-community both
neighbor 10.254.0.6 inherit peer-policy EVPN-VXLAN-PP
exit-address-family
!

Will shut down the NVE interfaces on the spines tomorrow to simplify it - that was enabled as there was still some access switches attached to the spines due to some temporary testing - I will need to remove these.

What triggers the NVE peering to come up and what it will peer with? It seems dynamic and I can't work out the logic of how peers are decided.

Oh thanks @shannonr for that focus. Sorry.

Trying to simplify... NVE peering in EVPN-VXLAN is established dynamically based on BGP EVPN advertisements. Devices within the EVPN-VXLAN fabric use BGP to announce VXLAN tunnel endpoints (VTEPs) and their associated VNIs. The NVE interfaces on these devices learn about each other through these BGP advertisements. BGP sessions are set up between devices, using iBGP (within the same AS). Once these BGP sessions are up, devices exchange EVPN routes that contain VTEP information, leading to the dynamic formation of NVE peers based on this learned data.

The trigger for NVE peering coming up is the establishment of BGP sessions between devices. When a BGP session is initiated, the devices exchange EVPN routes, each containing information about VTEPs and the VNIs they support. As a device receives EVPN routes from other devices, it dynamically learns about new VTEPs and establishes NVE peers accordingly. This dynamic nature of learning and peering means that as long as the BGP configuration is correct and the devices are advertising the correct VTEP and VNI information, NVE peering should automatically establish between devices. This dynamic peering is primarily determined by the BGP EVPN configuration where each device advertises its VTEP IP addresses and the VNIs it supports.

To simplify your setup and troubleshoot the issue, disabling the NVE interfaces on the spines temporarily can help isolate the problem to just the leaf switches. This step ensures that you can focus on the leaf-to-leaf peering and understand the root cause of any discrepancies. Pleasr verify that each leaf is correctly peering only with the spines and not with each other directly, unless specifically required, is essential. Additionally, ensuring that BGP EVPN routes are correctly advertised by each leaf and using commands like show bgp l2vpn evpn summary  can help verify the status of the BGP neighbors.

 

Best regards
.ı|ı.ı|ı. If This Helps, Please Rate .ı|ı.ı|ı.

Thanks for that - that is super helpful and a I appreciate the detail.

I will go through it tomorrow when I am in front of the config and can have that access switche moved over from the other spine, but what you've said makes a lot of sense - nve peering is not being established on leaf switches that currently have no access switches or any kind of hosts connected, therefore they have nothing to advertise which would be why the peering doesn't dynamically come up on both sides.

Once that access switch gets moved every leaf switch will have active hosts attached and I should see nve peering between all leaf switches.

Confirmed - BGP peering is only done between leaf and spine switches - there is no peering done between leaf switches directly. BGP peering between all leaf switches and the 2 spines is established correctly.

Hi,

I have now been able to get the cable moved so that all access switches are only plugged into leaf switches and not spines.
With this, NVE peering is up between all leaf switches and there is end-to-end connectivity for hosts across the fabric.

I have removed the NVE interfaces from the spines completely.

I am still seeing this error on the 2x 9500-16x leaf switches: 

[07/11/24 08:54:15.892 NZST 3E800 392] NVE-MGR-EI: L2FIB rvtep 100102:UNKNOWN cfg type 2 for BD 102
[07/11/24 08:54:15.892 NZST 3E801 392] NVE-MGR-EI ERROR: Invalid peer address for bd 102

BGP is up between all leaf switches, and spines. There is no BGP peering directly between leaf switches. Neither BGP or NVE peering is flapping. This repeats constantly every 15-45 seconds.

It does not appear on the 2x 9500-48 leaf switches and if I remove them from the topology the errors stop so it is definitely related to those.

Any idea what the error is saying/what is causing it? I'm reluctant to move this forward closer to production with this going on. 

I will check this tonight 

MHM

Thanks for that.

I enabled debug logging on the nve and here is a bit more output in case it helps:

[07/11/24 09:21:23.267 NZST 40808 392] NVE-MGR-EI: L2FIB rvtep 100100:10.254.1.6 cfg type 1 for BD 100
[07/11/24 09:21:23.267 NZST 40809 392] NVE-MGR-DB: creating peer node for 10.254.1.6
[07/11/24 09:21:23.267 NZST 4080A 392] NVE-DB-EVT: Add peer 10.254.1.6 on VNI 100100 on interface 0x45
[07/11/24 09:21:23.267 NZST 4080B 392] NVE-DB-EVT: Added peer 10.254.1.6 on VNI 100100 on NVE 1
[07/11/24 09:21:23.267 NZST 4080C 392] NVE-DB-EVT: Add VNI 100100 to peer 10.254.1.6 source 4
[07/11/24 09:21:23.267 NZST 4080D 392] NVE-DB-EVT: Add VNI 100100 to peer 10.254.1.6 NVE 1 rc 0 flags 4 state 32
[07/11/24 09:21:23.267 NZST 4080E 392] NVE-DB-EVT: Added VNI 100100 on Peer 10.254.1.6
[07/11/24 09:21:23.267 NZST 4080F 392] NVE-MGR-DB: PEER 1/10.254.1.6 ADD oper
[07/11/24 09:21:23.268 NZST 40810 392] NVE-MGR-TUNNEL: Tunnel Endpoint 10.254.1.6 added, dport 4789
[07/11/24 09:21:23.268 NZST 40811 392] NVE-MGR-EI: L2FIB rvtep 100100:UNKNOWN cfg type 2 for BD 100
[07/11/24 09:21:23.268 NZST 40812 392] NVE-MGR-EI ERROR: Invalid peer address for bd 100

It cycles through each VNI exactly the same.
This was taken on one of the 9500-16x leaf switches. 10.254.1.6 is one of the 9500-48 leaf switches.

Show ip ospf neighbor 

Show bgp l2vpn evpn summary 

Share output of both

MHM

Hi,

Output as requested below. As a reference:

  • Physical interfaces / 172.16.12.x = Used for OSPF underlay
  • Loopback 0 / 10.254.0.x = Used BGP overlay
  • Loopback 1 / 10.254.1.x = Used for NVE peering
  • 10.254.x.1 = Spine 1
  • 10.254.x.2 = Spine 2
  • 10.254.x.3 = Leaf 1 (9500-16x)
  • 10.254.x.4 = Leaf 2 (9500-16x)
  • 10.254.x.5 = Leaf 3 (9500-48x)
  • 10.254.x.6 = Leaf 4 (9500-48x)

Spine 1:
xxx-9500-16X_SPINE#sh ip ospf ne

Neighbor ID Pri State Dead Time Address Interface
10.254.0.5 0 FULL/ - 00:00:33 172.16.12.22 TenGigabitEthernet2/0/2
10.254.0.3 0 FULL/ - 00:00:32 172.16.12.6 TenGigabitEthernet2/0/1
10.254.0.4 0 FULL/ - 00:00:31 172.16.12.10 TenGigabitEthernet1/0/2
10.254.0.2 0 FULL/ - 00:00:34 172.16.12.2 TenGigabitEthernet1/0/1

xxx-9500-16X_SPINE#show bgp l2vpn evpn su
BGP router identifier 10.254.1.1, local AS number 65550
BGP table version is 787941, main routing table version 787941
72 network entries using 27648 bytes of memory
72 path entries using 16704 bytes of memory
28/28 BGP path/bestpath attribute entries using 8288 bytes of memory
6 BGP extended community entries using 240 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 52880 total bytes of memory
BGP activity 9124/9052 prefixes, 393722/393650 paths, scan interval 60 secs
960 networks peaked at 14:55:09 Jul 9 2024 NZST (2d00h ago)

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.254.0.3 4 65550 5179 12916 787941 0 0 1d04h 18
10.254.0.4 4 65550 5076 12923 787941 0 0 1d04h 18
10.254.0.5 4 65550 2686 12931 787941 0 0 1d04h 18
10.254.0.6 4 65550 5354 12922 787941 0 0 1d04h 18

Spine 2:
xxx-9500-16X_SPINE#sh ip ospf ne

Neighbor ID Pri State Dead Time Address Interface
10.254.0.6 0 FULL/ - 00:00:37 172.16.12.26 TenGigabitEthernet1/0/2
10.254.0.3 0 FULL/ - 00:00:39 172.16.12.14 TenGigabitEthernet2/0/1
10.254.0.1 0 FULL/ - 00:00:36 172.16.12.1 TenGigabitEthernet1/0/1

xxx-9500-16X_SPINE#sh bgp l2vpn evpn su
BGP router identifier 10.254.1.2, local AS number 65550
BGP table version is 787817, main routing table version 787817
72 network entries using 27648 bytes of memory
72 path entries using 16704 bytes of memory
28/28 BGP path/bestpath attribute entries using 8288 bytes of memory
6 BGP extended community entries using 240 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 52880 total bytes of memory
BGP activity 9388/9316 prefixes, 393594/393522 paths, scan interval 60 secs
960 networks peaked at 13:49:30 Jun 18 2024 NZST (3w2d ago)

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.254.0.3 4 65550 5180 12906 787817 0 0 1d04h 18
10.254.0.4 4 65550 5083 12922 787817 0 0 1d04h 18
10.254.0.5 4 65550 2690 12923 787817 0 0 1d04h 18
10.254.0.6 4 65550 5350 12929 787817 0 0 1d04h 18

 

Leaf 1:
xxx-9500-16X_VTEP#sh ip ospf ne

Neighbor ID Pri State Dead Time Address Interface
10.254.0.2 0 FULL/ - 00:00:38 172.16.12.13 TenGigabitEthernet2/0/1
10.254.0.4 0 FULL/ - 00:00:39 172.16.12.18 TenGigabitEthernet1/0/2
10.254.0.1 0 FULL/ - 00:00:33 172.16.12.5 TenGigabitEthernet1/0/1

xxx-9500-16X_VTEP#sh bgp l2vpn evpn sum
BGP router identifier 10.254.1.3, local AS number 65550
BGP table version is 1050857, main routing table version 1050857
126 network entries using 48384 bytes of memory
180 path entries using 41760 bytes of memory
32/32 BGP path/bestpath attribute entries using 9472 bytes of memory
6 BGP rrinfo entries using 240 bytes of memory
6 BGP extended community entries using 240 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 100096 total bytes of memory
BGP activity 9873/9747 prefixes, 562995/562815 paths, scan interval 60 secs
962 networks peaked at 13:50:07 Jun 18 2024 NZST (3w2d ago)

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.254.0.1 4 65550 12928 5185 1050857 0 0 1d04h 54
10.254.0.2 4 65550 12913 5185 1050857 0 0 1d04h 54

Leaf 2:
xxx-9500-16X_VTEP#sh ip ospf ne

Neighbor ID Pri State Dead Time Address Interface
10.254.0.3 0 FULL/ - 00:00:34 172.16.12.17 TenGigabitEthernet2/0/1
10.254.0.1 0 FULL/ - 00:00:32 172.16.12.9 TenGigabitEthernet1/0/1

xxx-9500-16X_VTEP#sh bgp l2vpn evpn sum
BGP router identifier 10.254.1.4, local AS number 65550
BGP table version is 997339, main routing table version 997339
126 network entries using 48384 bytes of memory
180 path entries using 41760 bytes of memory
32/32 BGP path/bestpath attribute entries using 9472 bytes of memory
6 BGP rrinfo entries using 240 bytes of memory
6 BGP extended community entries using 240 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 100096 total bytes of memory
BGP activity 10766/10640 prefixes, 526804/526624 paths, scan interval 60 secs
961 networks peaked at 13:50:11 Jun 18 2024 NZST (3w2d ago)

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.254.0.1 4 65550 12968 5090 997339 0 0 1d04h 54
10.254.0.2 4 65550 12962 5091 997339 0 0 1d04h 54

Leaf 3:
xxx-9500-48X_VTEP#sh ip ospf ne

Neighbor ID Pri State Dead Time Address Interface
10.254.0.1 0 FULL/ - 00:00:36 172.16.12.21 TwentyFiveGigE1/0/1

xxx-9500-48X_VTEP#sh bgp l2vpn evpn sum
BGP router identifier 10.254.1.5, local AS number 65550
BGP table version is 115258, main routing table version 115258
126 network entries using 48384 bytes of memory
180 path entries using 41760 bytes of memory
32/32 BGP path/bestpath attribute entries using 9472 bytes of memory
6 BGP rrinfo entries using 240 bytes of memory
6 BGP extended community entries using 240 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 100096 total bytes of memory
BGP activity 2718/2592 prefixes, 71917/71737 paths, scan interval 60 secs
978 networks peaked at 14:55:37 Jul 9 2024 NZST (2d00h ago)

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.254.0.1 4 65550 12985 2705 115258 0 0 1d04h 54
10.254.0.2 4 65550 12972 2707 115258 0 0 1d04h 54

Leaf 4:
xxx-9500-48X_VTEP#sh ip ospf ne

Neighbor ID Pri State Dead Time Address Interface
10.254.0.2 0 FULL/ - 00:00:35 172.16.12.25 TwentyFiveGigE1/0/1

xxx-9500-48X_VTEP#sh bgp l2vpn evpn sum
BGP router identifier 10.254.1.6, local AS number 65550
BGP table version is 93206, main routing table version 93206
126 network entries using 48384 bytes of memory
180 path entries using 41760 bytes of memory
32/32 BGP path/bestpath attribute entries using 9472 bytes of memory
6 BGP rrinfo entries using 240 bytes of memory
6 BGP extended community entries using 240 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 100096 total bytes of memory
BGP activity 1229/1103 prefixes, 56601/56421 paths, scan interval 60 secs
543 networks peaked at 14:55:37 Jul 9 2024 NZST (2d00h ago)

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.254.0.1 4 65550 12982 5374 93206 0 0 1d04h 54
10.254.0.2 4 65550 12984 5368 93206 0 0 1d04h 54

It seem that your leaf to spine connection is not correct 

Each leaf must connect to both spines and spines must not interconnect to each other.

You can see in leaf4 it have ospf neighbor with only one spines 

MHM

Hey,

Thanks for that. My understanding is that the OSPF underlay is to provide reachability for the overlay. It is not possible to have a physical connection between, say, leaf 4 and spine 1, but OSPF between leaf 4 and spine 2 provides reachability for BGP and BGP peering is established between leaf 4 and both spines to form the fabric.

I understand it is sub-optimal, but this is a multi-pod type topology across diverse locations and a full fabric of physical connectivity between all leaves and spines is not possible. A full fabric in the overlay is possible and is in place.

Please correct me if I am missing something?

@shannonr 

In a traditional spine and leaf architecture, every leaf switch typically connects to every spine switch, ensuring a consistent number of hops (usually two) between any two leaf switches. This design provides predictable latency and redundancy.

However, in a multi-pod or geographically diverse setup, it’s not always feasible to have a direct physical connection between every leaf and every spine switch... so, your setup, which involves an incomplete physical fabric but relies on OSPF for underlay reachability, is a practical approach for such scenarios. 

 

Best regards
.ı|ı.ı|ı. If This Helps, Please Rate .ı|ı.ı|ı.
Review Cisco Networking for a $25 gift card