cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3524
Views
24
Helpful
48
Replies

EVPN VXLAN NVE Peers only showing up on one side and are inconsistent

shannonr
Level 1
Level 1

I am setting up EVPN-VXLAN as an overlay to provide L2 reachability.

I have 2 spines and 4 leaves. Spines are Cat 9500-16x and Leaves are a mix of Cat 9500-16x and Cat 9500-48y4c (2 of each).

Looking at the NVE peers on all the switches:

  • Spine1 peers with Leaf 1-2
  • Spine 2 peers with Leaf 1-2
  • Leaf 1 peers with Spine 1,2, Leaf 2
  • Leaf 2 peers with Spine 1,2, Leaf 1.
  • Leaf 3 thinks it peers with Spine 1-2, and Leaf 1-2
  • Leaf 4 thinks it peers with Spine1-2 and Leaf 1-2.

I'm not sure why leaves 1-2 peer with each other. And I'm not sure why leaves 3-4 think they are peered with everything (except with each other), when the other peer doesn't recognise the peering.

I also receive these errors in the logs on the 9500-16x switches:
*Jul 9 01:42:31.759: NVE-MGR-EI: L2FIB rvtep 100102:UNKNOWN cfg type 2 for BD 102
*Jul 9 01:42:31.759: NVE-MGR-EI ERROR: Invalid peer address for bd 102

These errors are definitely related as, as, if I disable the NVE interfaces on the 9500-48 switches (leaves 3-4) the errors stop.

Config is identical on all leaves.

  • Spine1 = 10.254.1.1 (NVE IP, source interface loopback 1)
  • Spine2 = 10.254.1.2
  • Leaf 1 (9500-16) = 10.254.1.3
  • Leaf 2 (9500-16) = 10.254.1.4
  • Leaf 3 (9500-48) = 10.254.1.5
  • Leaf 4 (9500-48) = 10.254.1.6

 

Output from Leaf 5 showing it has a peering to leaf 2:

RND-9500-48X_VTEP(config-if)#do sh nve peers | i 10.254.1.4
nve1 100100 L2CP 10.254.1.4 4 100100 UP N/A 00:03:49

Output from Leaf 2 showing no peering to leaf 5:

S4NR-9500-16X_VTEP#sh nve peers | i 10.254.1.5
S4NR-9500-16X_VTEP#

S4NR-9500-16X_VTEP#sh nve peers | i 10.254.1.
nve1 100100 L2CP 10.254.1.1 4 100100 UP N/A 23:49:47
nve1 100100 L2CP 10.254.1.2 4 100100 UP N/A 23:49:47
nve1 100100 L2CP 10.254.1.3 4 100100 UP N/A 23:49:47

Note: the loopbacks used by the NVE interfaces can ping each other so the underlay routing is working fine (OSPF).

NVE interface config (identical on all switches):

interface nve1
no ip address
source-interface Loopback1
host-reachability protocol bgp
end

What is going on?

I've not sure how to troubleshoot further.

Thanks!

48 Replies 48

Oops sorry, didn't read that properly.

That command doesn't exist with the peer IP - the closest I can find is show bgp l2vpn evpn neighbors <peer ip> which shows the neighbor details for the evpn BGP peering to the 2x spine switches.

#show bgp l2vpn evpn neighbors 10.254.0.1
BGP neighbor is 10.254.0.1, remote AS 65550, internal link
BGP version 4, remote router ID 10.254.0.1
BGP state = Established, up for 00:40:10
Last read 00:00:42, last write 00:00:45, hold time is 180, keepalive interval is 60 seconds
Last update received: 00:03:35
Neighbor sessions:
1 active, is not multisession capable (disabled)
Neighbor capabilities:
Route refresh: advertised and received(new)
Four-octets ASN Capability: advertised and received
Address family L2VPN Evpn: advertised and received
Enhanced Refresh Capability: advertised and received
Multisession Capability:
Stateful switchover support enabled: NO for session 1
Message statistics:
InQ depth is 0
OutQ depth is 0

Sent Rcvd
Opens: 1 1
Notifications: 0 0
Updates: 108 287
Keepalives: 40 32
Route Refresh: 0 0
Total: 149 322
Do log neighbor state changes (via global configuration)
Default minimum time between advertisement runs is 0 seconds

For address family: L2VPN E-VPN
Session: 10.254.0.1
BGP table version 1238, neighbor version 1238/0
Output queue size : 0
Index 1, Advertise bit 0
1 update-group member
Community attribute sent to this neighbor
Extended-community attribute sent to this neighbor
Slow-peer detection is disabled
Slow-peer split-update-group dynamic is disabled
Prefers VxLAN if VTEP is UP else MPLS
Sent Rcvd
Prefix activity: ---- ----
Prefixes Current: 16 48 (Consumes 22272 bytes)
Prefixes Total: 100 209
Implicit Withdraw: 28 0
Explicit Withdraw: 56 161
Used as bestpath: n/a 96
Used as multipath: n/a 0
Used as secondary: n/a 0

Outbound Inbound
Local Policy Denied Prefixes: -------- -------
ORIGINATOR loop: n/a 72
Bestpath from this peer: 209 n/a
Bestpath from iBGP peer: 66 n/a
AF Permit Check: 235 n/a
Total: 510 72
Number of NLRIs in the update sent: max 7, min 0
Current session network count peaked at 48 entries at 09:16:12 Jul 19 2024 NZST (00:33:10.760 ago)
Highest network count observed at 48 entries at 09:16:12 Jul 19 2024 NZST (00:33:10.760 ago)
Last detected as dynamic slow peer: never
Dynamic slow peer recovered: never
Refresh Epoch: 2
Last Sent Refresh Start-of-rib: never
Last Sent Refresh End-of-rib: never
Last Received Refresh Start-of-rib: 00:40:10
Last Received Refresh End-of-rib: 00:40:11
Refresh-In took 0 seconds
Sent Rcvd
Refresh activity: ---- ----
Refresh Start-of-RIB 0 1
Refresh End-of-RIB 0 1

Address tracking is enabled, the RIB does have a route to 10.254.0.1
Route to peer address reachability Up: 1; Down: 0
Last notification 00:40:12
Connections established 1; dropped 0
Last reset never
Interface associated: (none) (peering address NOT in same link)
Transport(tcp) path-mtu-discovery is enabled
Graceful-Restart is disabled
SSO is disabled
Connection state is ESTAB, I/O status: 1, unread input bytes: 0
Connection is ECN Disabled, Mininum incoming TTL 0, Outgoing TTL 255
Local host: 10.254.0.6, Local port: 179
Foreign host: 10.254.0.1, Foreign port: 37404
Connection tableid (VRF): 0
Maximum output segment queue size: 50

Enqueued packets for retransmit: 0, input: 0 mis-ordered: 0 (0 bytes)

Event Timers (current time is 0x267603):
Timer Starts Wakeups Next
Retrans 81 0 0x0
TimeWait 0 0 0x0
AckHold 191 70 0x0
SendWnd 0 0 0x0
KeepAlive 0 0 0x0
GiveUp 0 0 0x0
PmtuAger 0 0 0x0
DeadWait 0 0 0x0
Linger 0 0 0x0
ProcessQ 0 0 0x0

iss: 3127897430 snduna: 3127911487 sndnxt: 3127911487
irs: 67791414 rcvnxt: 67832464

sndwnd: 10886 scale: 0 maxrcvwnd: 16384
rcvwnd: 8786 scale: 0 delrcvwnd: 7598

SRTT: 1000 ms, RTTO: 1003 ms, RTV: 3 ms, KRTT: 0 ms
minRTT: 0 ms, maxRTT: 1000 ms, ACK hold: 80 ms
uptime: 2413436 ms, Sent idletime: 45148 ms, Receive idletime: 45228 ms
Status Flags: passive open, gen tcbs
Option Flags: nagle, path mtu capable, SACK option permitted
win-scale
IP Precedence value : 6
Window update Optimisation : Enabled
ACK Optimisation : Dynamic ACK Tuning Enabled

Datagrams (max data segment is 9158 bytes):
Peer MSS: 9158
Rcvd: 237 (out of order: 0), with data: 195, total data bytes: 41049
Sent: 161 (retransmit: 0, fastretransmit: 0, partialack: 0, Second Congestion: 0), with data: 81, total data bytes: 14056

Packets received in fast path: 0, fast processed: 0, slow path: 0
fast lock acquisition failures: 0, slow path: 0
TCP Semaphore 0x70DFBA3E2108 FREE

 

#show bgp l2vpn evpn neighbors 10.254.0.2
BGP neighbor is 10.254.0.2, remote AS 65550, internal link
BGP version 4, remote router ID 10.254.0.2
BGP state = Established, up for 00:40:20
Last read 00:00:15, last write 00:00:42, hold time is 180, keepalive interval is 60 seconds
Last update received: 00:03:45
Neighbor sessions:
1 active, is not multisession capable (disabled)
Neighbor capabilities:
Route refresh: advertised and received(new)
Four-octets ASN Capability: advertised and received
Address family L2VPN Evpn: advertised and received
Enhanced Refresh Capability: advertised and received
Multisession Capability:
Stateful switchover support enabled: NO for session 1
Message statistics:
InQ depth is 0
OutQ depth is 0

Sent Rcvd
Opens: 1 1
Notifications: 0 0
Updates: 108 287
Keepalives: 40 33
Route Refresh: 0 0
Total: 149 323
Do log neighbor state changes (via global configuration)
Default minimum time between advertisement runs is 0 seconds

For address family: L2VPN E-VPN
Session: 10.254.0.2
BGP table version 1238, neighbor version 1238/0
Output queue size : 0
Index 1, Advertise bit 0
1 update-group member
Community attribute sent to this neighbor
Extended-community attribute sent to this neighbor
Slow-peer detection is disabled
Slow-peer split-update-group dynamic is disabled
Prefers VxLAN if VTEP is UP else MPLS
Sent Rcvd
Prefix activity: ---- ----
Prefixes Current: 16 48 (Consumes 11136 bytes)
Prefixes Total: 100 209
Implicit Withdraw: 28 0
Explicit Withdraw: 56 161
Used as bestpath: n/a 0
Used as multipath: n/a 0
Used as secondary: n/a 0

Outbound Inbound
Local Policy Denied Prefixes: -------- -------
ORIGINATOR loop: n/a 72
Bestpath from this peer: 209 n/a
Bestpath from iBGP peer: 66 n/a
AF Permit Check: 235 n/a
Total: 510 72
Number of NLRIs in the update sent: max 7, min 0
Current session network count peaked at 48 entries at 09:16:12 Jul 19 2024 NZST (00:33:20.799 ago)
Highest network count observed at 48 entries at 09:16:12 Jul 19 2024 NZST (00:33:20.799 ago)
Last detected as dynamic slow peer: never
Dynamic slow peer recovered: never
Refresh Epoch: 2
Last Sent Refresh Start-of-rib: never
Last Sent Refresh End-of-rib: never
Last Received Refresh Start-of-rib: 00:40:20
Last Received Refresh End-of-rib: 00:40:21
Refresh-In took 0 seconds
Sent Rcvd
Refresh activity: ---- ----
Refresh Start-of-RIB 0 1
Refresh End-of-RIB 0 1

Address tracking is enabled, the RIB does have a route to 10.254.0.2
Route to peer address reachability Up: 1; Down: 0
Last notification 00:40:21
Connections established 1; dropped 0
Last reset never
Interface associated: (none) (peering address NOT in same link)
Transport(tcp) path-mtu-discovery is enabled
Graceful-Restart is disabled
SSO is disabled
Connection state is ESTAB, I/O status: 1, unread input bytes: 0
Connection is ECN Disabled, Mininum incoming TTL 0, Outgoing TTL 255
Local host: 10.254.0.6, Local port: 179
Foreign host: 10.254.0.2, Foreign port: 52262
Connection tableid (VRF): 0
Maximum output segment queue size: 50

Enqueued packets for retransmit: 0, input: 0 mis-ordered: 0 (0 bytes)

Event Timers (current time is 0x2693F6):
Timer Starts Wakeups Next
Retrans 81 0 0x0
TimeWait 0 0 0x0
AckHold 192 71 0x0
SendWnd 0 0 0x0
KeepAlive 0 0 0x0
GiveUp 0 0 0x0
PmtuAger 0 0 0x0
DeadWait 0 0 0x0
Linger 0 0 0x0
ProcessQ 0 0 0x0

iss: 1439437602 snduna: 1439451659 sndnxt: 1439451659
irs: 4134563666 rcvnxt: 4134604735

sndwnd: 10886 scale: 0 maxrcvwnd: 16384
rcvwnd: 8767 scale: 0 delrcvwnd: 7617

SRTT: 1000 ms, RTTO: 1003 ms, RTV: 3 ms, KRTT: 0 ms
minRTT: 0 ms, maxRTT: 1000 ms, ACK hold: 80 ms
uptime: 2421399 ms, Sent idletime: 16027 ms, Receive idletime: 16107 ms
Status Flags: passive open, gen tcbs
Option Flags: nagle, path mtu capable, SACK option permitted
win-scale
IP Precedence value : 6
Window update Optimisation : Enabled
ACK Optimisation : Dynamic ACK Tuning Enabled

Datagrams (max data segment is 9158 bytes):
Peer MSS: 9158
Rcvd: 239 (out of order: 0), with data: 196, total data bytes: 41068
Sent: 162 (retransmit: 0, fastretransmit: 0, partialack: 0, Second Congestion: 0), with data: 81, total data bytes: 14056

Packets received in fast path: 0, fast processed: 0, slow path: 0
fast lock acquisition failures: 0, slow path: 0
TCP Semaphore 0x70DFBA3E21D8 FREE

 


KAP-9500-48X_VTEP#show bgp l2vpn evpn ?
A.B.C.D/nn Display route-type 2 and 5 routes matching the IP prefix <network>/<length>
H.H.H Display route-type 2 with specified MAC Address (H.H.H or HH:HH:HH:HH:HH:HH)
Hex-string Display route-type 2 with specified MAC Address (6 Octets in hexidecimal)
X:X:X:X::X/<0-128> Display route-type 2 and 5 routes matching the IPv6 prefix <network>/<length>
all Display information about all EVPN NLRIs
binding-sid Display binding SID information
bmp BGP BMP information
cluster-ids Display configured cluster IDs
community Display routes matching the communities
community-list Display routes matching the community-list
dampening Display detailed information about dampening
detail Display detailed routes
evi Display information about L2 EVPN EVI
extcommunity-list Display routes matching the extcommunity-list
filter-list Display routes conforming to the filter-list
inconsistent-as Display only routes with inconsistent origin ASs
large-community Display routes matching the large communities
largecommunity-list Display routes matching the largecommunity-list
local-vtep Display information for vxlan local-vteps
neighbors Detailed information on TCP and BGP neighbor connections
nexthops Nexthop address table
overlay-mapping Overlay mapping table
path-attribute Display path-attribute specific information
peer-group Display information on peer-groups
pending-prefixes Display prefixes pending deletion
quote-regexp Display routes matching the AS path "regular expression"
rd Display information for a route distinguisher
regexp Display routes matching the AS path regular expression
replication Display replication status of update-group(s)
rib-failure Display bgp routes that failed to install in the routing table (RIB)
rnh Display information for vxlan remote RNH
route-type Display information for certain route-type
sr-policy Display SR Policy information
summary Summary of BGP neighbor status
update-group Display information on update-groups
update-sources Update source interface table
version Display prefixes with matching version numbers
| Output modifiers

xxx-9500-48X_VTEP#sh bgp l2vpn evpn f839.182a.0d51 <<- this mac address connect to 10.254.1.3
BGP routing table entry for [2][10.254.1.3:32867][0][48][F839182A0D51][0][*]/20, version 146629
Paths: (2 available, best #2, table EVPN-BGP-Table)
Not advertised to any peer
Refresh Epoch 1
Local
10.254.1.3 (metric 201) (via default) from 10.254.0.2 (10.254.0.2)
Origin incomplete, metric 0, localpref 100, valid, internal
EVPN ESI: 00000000000000000000, Label1 100100
Extended Community: RT:23456:100 ENCAP:8
Originator: 10.254.1.3, Cluster list: 10.254.0.2
rx pathid: 0, tx pathid: 0
Updated on Jul 12 2024 08:55:50 NZST
Refresh Epoch 3
Local
10.254.1.3 (metric 201) (via default) from 10.254.0.1 (10.254.0.1)
Origin incomplete, metric 0, localpref 100, valid, internal, best
EVPN ESI: 00000000000000000000, Label1 100100
Extended Community: RT:23456:100 ENCAP:8
Originator: 10.254.1.3, Cluster list: 10.254.0.1
rx pathid: 0, tx pathid: 0x0
Updated on Jul 12 2024 08:55:50 NZST
BGP routing table entry for [2][10.254.1.3:32867][0][48][F839182A0D51][32][192.168.100.103]/24, version 146626
Paths: (2 available, best #2, table EVPN-BGP-Table)
Not advertised to any peer
Refresh Epoch 1
Local
10.254.1.3 (metric 201) (via default) from 10.254.0.2 (10.254.0.2)
Origin incomplete, metric 0, localpref 100, valid, internal
EVPN ESI: 00000000000000000000, Label1 100100
Extended Community: RT:23456:100 ENCAP:8
Originator: 10.254.1.3, Cluster list: 10.254.0.2
rx pathid: 0, tx pathid: 0
Updated on Jul 12 2024 08:55:50 NZST
Refresh Epoch 3
Local
10.254.1.3 (metric 201) (via default) from 10.254.0.1 (10.254.0.1)
Origin incomplete, metric 0, localpref 100, valid, internal, best
EVPN ESI: 00000000000000000000, Label1 100100
Extended Community: RT:23456:100 ENCAP:8
Originator: 10.254.1.3, Cluster list: 10.254.0.1
rx pathid: 0, tx pathid: 0x0
Updated on Jul 12 2024 08:55:50 NZST
BGP routing table entry for [2][10.254.1.5:32867][0][48][F839182A0D51][0][*]/20, version 146631<<- why it have ESI different if it connect to one leaf???
Paths: (1 available, best #1, table evi_100)
Not advertised to any peer
Refresh Epoch 3
Local, imported path from [2][10.254.1.3:32867][0][48][F839182A0D51][0][*]/20 (global)
10.254.1.3 (metric 201) (via default) from 10.254.0.1 (10.254.0.1)
Origin incomplete, metric 0, localpref 100, valid, internal, best
EVPN ESI: 00000000000000000000, Label1 100100
Extended Community: RT:23456:100 ENCAP:8
Originator: 10.254.1.3, Cluster list: 10.254.0.1
rx pathid: 0, tx pathid: 0x0
Updated on Jul 12 2024 08:55:50 NZST
BGP routing table entry for [2][10.254.1.5:32867][0][48][F839182A0D51][32][192.168.100.103]/24, version 146628
Paths: (1 available, best #1, table evi_100)
Not advertised to any peer
Refresh Epoch 3
Local, imported path from [2][10.254.1.3:32867][0][48][F839182A0D51][32][192.168.100.103]/24 (global)
10.254.1.3 (metric 201) (via default) from 10.254.0.1 (10.254.0.1)
Origin incomplete, metric 0, localpref 100, valid, internal, best
EVPN ESI: 00000000000000000000, Label1 100100
Extended Community: RT:23456:100 ENCAP:8
Originator: 10.254.1.3, Cluster list: 10.254.0.1
rx pathid: 0, tx pathid: 0x0
Updated on Jul 12 2024 08:55:50 NZST

Hi there,

My apologies I am new to this and am not sure what you mean. Could you please explain in detail?

Thanks

10.254.1.5:32867

10.254.1.3:32867

 

these two leaf advertise same MAC' so

1- do you have any multihome link or port channel?

2- do you have any stsckwise virtual?

Try disable nve of one of leaf and check the error message is it disappear or not

MHM

 

Hey thanks for that 

1 - No, there are no multi-homed links or port channels currently, although port channels from access switches to leaf switches are planned in the future.
2 - Yes, all 9500-16x switches are part of a stackwise virtual. This includes both spines and 2 of the leaf switches.

I have disabled nve on leaf3 (10.254.1.3). This has resulted in:
a) no errors being logged on that switch (they do still appear on other leaf switches)
b) the client with MAC f839.182a.0d51 is no longer reachable - this makes sense as it is directly connected to leaf3, and without the NVE peering there is no way for its mac to be known.

@MHM Cisco World After my previous reply I have sequentially shut down the nve interface on each leaf switch. 

Interestingly after shutting it down on leaf4 (10.254.1.6) the errors have stopped. It has only been 10 minutes and I need to monitor for longer but I would normally see multiple errors within 10 minutes.

You might be on to something here...I will monitor and if still no errors I will investigate and report back. Without further investigation I have no idea what is different about leaf4 - it should be set up identical to leaf3.

10.254.1.5 <<- when I meant shut nve I was want to shut nve i  .5 leaf not .3 leaf' .3 leaf is what mac direct connect that what we see from show above.

leaf4 shut the error disappear that excellent' 

Check

1- evi-base

2-l2vpn-base

3- you add correct IP for LO use for bgp abd nve 

And make sure leaf4 not config as server route or RR 

Waiting good news 

Goodluck friend 

MHM

Unfortunately, it is not good news. Not long after posting that the errors started again, it just took longer than normal to start.

I have gone back and shut the .5 leaf nve interface and reviewed the config per your advice:
1) evi base is set via the profile and is identical on all leaf switches
l2vpn evpn profile pot-inter-dc
evi-base 10000
l2vni-base 100000
multicast advertise enable

2) l2vpn base is also set identically on all leaf switches via the profile

3) Loopback interfaces look correct.
NVE is using loopback 1 on all leaf switches and loopback 1 is configured correctly (using 10.254.1.[3-6]
BGP is using loopback 0 on all leaf switches and loopback 0 is configured correctly (using 10.254.0.[3-6]

Leaf switches are not set as RR - Leaf BGP config below as a reference, only router-id is different on each leaf switch:

router bgp 65550
bgp router-id 10.254.0.6
bgp log-neighbor-changes
no bgp default ipv4-unicast
neighbor 10.254.0.1 remote-as 65550
neighbor 10.254.0.1 update-source Loopback0
neighbor 10.254.0.2 remote-as 65550
neighbor 10.254.0.2 update-source Loopback0
!
address-family ipv4
exit-address-family
!
address-family l2vpn evpn
neighbor 10.254.0.1 activate
neighbor 10.254.0.1 send-community both
neighbor 10.254.0.2 activate
neighbor 10.254.0.2 send-community both
exit-address-family


Note: NVE peering is established between all leaf switches (except .5 which is shutdown).
Note: BGP is established between all leaf switches and the two spine switches. There is no BGP established directly between leaf switches and other leaf switches.

Hi friend 

Sorry I dont get your Last reply 

You shut down the NVE in leaf .5 or  not?

The mac is connect to leaf .3 and it adverise it.

MHM

Hey there,

Yes I shut down NVE on leaf .5 but unfortunately the errors are still occurring. Yes the MAC is connected to leaf .3 and that host is still reachable after shutting down NVE on leaf .5.

Thanks

1-share this 

Show l2vpn evpn evi <> detail 

2- check this bug 

https://bst.cisco.com/bugsearch/bug/CSCvw54690?rfs=qvlogin

Thanks 

MHM

Hey thanks for sharing that - I have attached the output of "Show l2vpn evpn evi <> detail" for the 4x EVIs we currently have configured for all 4 leaf switches so you can compare.

Reading through that bug ID I don't think it applies here as the NVE peering is not flapping and RMACs are being learned, for example:

Interface VNI Type Peer-IP RMAC/Num_RTs eVNI state flags UP time
nve1 100100 L2CP 10.254.1.3 4 100100 UP N/A 1d01h
nve1 100100 L2CP 10.254.1.5 4 100100 UP N/A 1d00h
nve1 100100 L2CP 10.254.1.6 4 100100 UP N/A 1d00h

Screenshot (792).png

Hi,

Thanks for your efforts with this. As requested I have attached the text file output from each leaf for your review.

Note: Per your instruction and topology there is only a single host connected and it is attached to leaf1 (.3 IP address).

Note: Because only one host is attached, only leaf1 is advertising routes to the spine switches. No other leaf switches are advertising routes as there is no attached hosts to advertise.

Note: With only one host connected there is no NVE peering between leaf switches and so the errors have stopped.

how one Host connect with same IP appear with different MAC in all EVI ?
or you connect it each VLAN ?

MHM

Review Cisco Networking for a $25 gift card