cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
270
Views
1
Helpful
3
Replies

BGP EVPN VXLAN Silent Host | Assymetric NVE Peers on Catalyst 9500

pwtn
Level 1
Level 1

I have a Layer 2 BGP EVPN VXLAN setup on some catalyst 9500 series switches, I have a problem with a specific VNI peering only showing as up on one of two peers which both have local clients for the VLAN. I can force peerings up if I configure an SVI on the remote leaf, even with a no IP address command this seems to force a MAC BGP routes to propagate and MAC learning to occur. I suspect there is a silent host type problem happening and I will attempt to explain it here in hopes that someone can tell me what I'm doing wrong.

Scenario

Node A has many local hosts in VLAN 501, it is a BGP Route Reflector and also the border node between the L2 EVPN network and a native layer 2 network. (layer 2 handoff.

Node B only has 1 local host in VLAN 501 attached to a layer 2 access switch, it is a leaf node and communictes with the border node via the fabric (routed underlay).

The two nodes can reach each other over the layer 3 underlay network, they have L2VPN EVPN BGP peerings, and multicast replication. This is all pretty much done as per the official guide.

Problem

The single host attached to the Leaf node (Node B) is a fairly silent host, it has a static IP address configuration, and it can sometimes go hours without attempting to communicate with anything. When I add the VTEP/NVE configuration for VLAN 501 to Node B, I see type 2 BGP EVPN routes coming in from Node A for many MAC address in VLAN 501. However over on Node A I see no BGP routes coming from Node B. The NVE/VNI peering on Node A for VLAN 501 will show as DOWN (assuming because there is no BGP routes being learned). However, the NVE/VNI peering on Node B for VLAN 501 will show UP. In this state the single remote client on Node B is unreachable from the Node A side of VLAN 501, I assume because Node A has no BGP routes and no NVE peer for the VNI.

Troubleshooting reveals that the MAC address for the single remote client connected to Node B is no longer in the mac address table of Node B, though the client is still attached to the same port, same VLAN etc. I found that by creating an SVI for VLAN 501 on Node B, the MAC of the single remote client appears in Node B's mac address table. The BGP routes are then advertsied to Node A, the NVE peering comes up on Node A, and the remote client on Node B is now reachable. This is true even if I configure "no ip address" on the Node B VALN 501 SVI, so long as an SVI is present MAC entries seem to remain in the table and the BGP routes and NVE peerings stay in place on Node A. The single remote client attached in VLAN 501 on Node B is now reachable.

If I completely remove the SVI from Node B with "no interface vlan 501" the client mac will eventually (a few minutes) time out of the Node B mac address table, the BGP routes are then withdrawn and the NVE peer goes down on Node A. The remote client is once again unreachable from the Node A side of the VLAN.

Additional Info

Node B also has an EVPN setup for VLAN 600, which also has a single host, however this VLAN does not seem to experience the asymetric NVE peering or loss of mac/BGP info. The EVPN configuration for both is pretty much same aside from VNI numbers and multicast replication IP. My best guess is that the VLAN 501 host on Node B is just not sending enough traffic to keep it's MAC entry in the table so the BGP routes are getting withdrawn. What I cannot explain is why an SVI with no IP on Node B forces MAC info to remain in the mac address table.

3 Replies 3

Enes Simnica
Level 1
Level 1

hey there @pwtn. Even though it is hard to find the right solution without seeing any configuration or some show commands, I still believe that the issue you're facing is related to MAC address aging for the silent host in VLAN 501. Since the host generates little traffic, its MAC address is timing out, causing the BGP EVPN routes to be withdrawn on Node B, which leads to the asymmetric peering behavior.

When you configure an SVI (even with no IP), it keeps the MAC in the table, allowing Node B to advertise the necessary Type 2 routes and keep the VXLAN peering active on Node A.

To fix this, you can increase the MAC aging timer on Node B for VLAN 501, which will prevent the MAC from aging out so quickly. Here's how you can increase the MAC aging timer:

 

#vlan configuration 501
#mac address-table aging-time ....

This should help maintain the MAC and prevent the peering from going down, and I believe this should be helpful in addressing the issue, lets see/

-Enes

more Cisco?!
more Gym?!

Upon further investigation I can confirm the problem (loss of connection) never happens when I have the following config on Node B:

interface Vlan501
no ip address
end

as soon as I delete this interface, before the MAC has even timed out, I lose connection and cannot ping the the silent host from hosts in VLAN 501 attached to Node A. As soon as I add the interface, pings start working again. A packet capture on the silent host port shows that all traffic (polling etc) destined to the silent host stops flowing as soon as I remove the SVI from Node-B.

The silent host is attached to a downstream layer 2 switch (9300) via an 802.1Q trunk link. It's almost like this trunk breaks when I remove the SVI from node B, and then the mac times out. But the trunk and local STP instance seem fine, certianly all other VLANs are still working, and there are no log messages indicating a fault.

Review Cisco Networking for a $25 gift card