cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
552
Views
0
Helpful
1
Replies

EVPN/VXLAN-Multi Site Layer 2 issue

kmarkov
Level 1
Level 1

Dear all,
I am running in to very strange problem - currently busy with preparation in our LAB for a migration of EVPN/VXLAN setup from single-site to multi-site. During the testing in our LAB I have discovered the following.
The LAB is already in multi-site setup, within the local site we use multicast replication and the multi-site interconnect uses ingress-replication.
The lab consists of 2 sites, Site A has a single BGW-Spine with 2 leafs in Fabric VPC setup, Site B has the same setup 1 BGW-Spine and 2 leafs in Fabric VPC. In every site I have a single homed client connected to one leaf, the clients sit in the same VLAN.
The problem is the following. When I try to ping Client B from Client A in the same VLAN the traffic does not work unless  arp-suppression is enabled. When I want to ping let's say from client A to a client in Site-B in a different VLAN (Layer3 traffic) the connection works without arp-suppression. Generally, I want to understand why arp-suppression seams mandatory in my case in order to achieve Layer 2 reachability between the sites. In the past we had serious problems with arp-suppression and our single site fabric.

Any ideas would be highly appreciated.

 

 

1 Reply 1

AshSe
VIP
VIP

Hello @kmarkov 

The issue you're encountering is related to how ARP (Address Resolution Protocol) is handled in an EVPN/VXLAN multi-site setup, particularly when using ingress replication for inter-site communication. Let me break this down and explain why ARP suppression is playing a critical role in your scenario.

Key Concepts in Your Setup:

  1. Multicast Replication (Intra-Site):

    • Within a single site, multicast is used for BUM (Broadcast, Unknown unicast, and Multicast) traffic replication. This ensures that ARP requests and other broadcast traffic are properly distributed to all relevant devices within the site.
  2. Ingress Replication (Inter-Site):

    • Between sites, ingress replication is used for BUM traffic. This means that the BGW (Border Gateway) in each site replicates BUM traffic to the other sites over unicast tunnels. This is less efficient than multicast but is often used in multi-site setups where multicast is not available or not desired across the interconnect.
  3. ARP Suppression:

    • ARP suppression is a feature of EVPN that reduces the need for broadcast ARP requests by using the EVPN control plane to resolve MAC-to-IP mappings. When a device sends an ARP request, the leaf switch can respond directly using information learned from EVPN advertisements, avoiding the need to flood the ARP request as a broadcast.

Why ARP Suppression is Mandatory in Your Case:

  1. Inter-Site ARP Broadcasts with Ingress Replication:

    • Without ARP suppression, when Client A in Site A sends an ARP request for Client B in Site B, the ARP request is treated as broadcast traffic (BUM). In your setup, ingress replication is used for inter-site BUM traffic. However, ingress replication relies on the BGWs to replicate and forward this traffic to the other site.
    • If the BGWs are not properly configured to handle ARP broadcasts or if there are limitations in how ingress replication handles BUM traffic across sites, the ARP request may not reach Site B, and Client A will not be able to resolve the MAC address of Client B.
  2. ARP Suppression Eliminates the Need for Broadcasts:

    • With ARP suppression enabled, the leaf switch in Site A can respond to Client A's ARP request directly using the EVPN control plane. The MAC-to-IP mapping for Client B is learned and distributed via EVPN, so there is no need to send a broadcast ARP request across the inter-site link. This ensures that Client A can resolve Client B's MAC address and establish Layer 2 connectivity.
  3. Layer 3 Traffic Works Without ARP Suppression:

    • When you ping a client in a different VLAN (Layer 3 traffic), the traffic is routed. In this case, the ARP resolution happens locally on the gateway (typically the leaf switch or BGW), and the inter-site communication is handled as routed traffic. This does not rely on BUM traffic or ARP broadcasts, which is why it works even without ARP suppression.

Why You Had Problems with ARP Suppression in the Past:

In single-site setups, ARP suppression can sometimes cause issues if:

  1. The EVPN control plane is not properly synchronized, leading to stale or missing MAC-to-IP mappings.
  2. There are bugs or misconfigurations in the EVPN implementation on your devices.
  3. The ARP suppression feature is not fully supported or behaves inconsistently in your hardware/software version.

These issues can lead to scenarios where ARP requests are not properly resolved, causing connectivity problems. However, in a multi-site setup with ingress replication, ARP suppression becomes more critical because it eliminates the dependency on inter-site BUM traffic for ARP resolution.


Recommendations:

  1. Enable ARP Suppression:

    • Since ARP suppression is critical for inter-site Layer 2 connectivity in your setup, ensure that it is enabled and properly configured on all leaf switches.
  2. Verify EVPN Control Plane:

    • Check that the EVPN control plane is functioning correctly and that MAC-to-IP mappings are being advertised and learned across all devices. Use commands like show evpn or equivalent to verify this.
  3. Inspect BGW Configuration:

    • Ensure that the BGWs are correctly configured to handle ingress replication for BUM traffic. Verify that inter-site BUM traffic is being forwarded as expected.
  4. Test with Multicast Replication:

    • If possible, test using multicast replication for inter-site BUM traffic instead of ingress replication. This can help determine if the issue is specific to ingress replication.
  5. Upgrade Software:

    • If you experienced issues with ARP suppression in the past, consider upgrading to a more recent and stable software version that addresses known bugs or limitations in the EVPN/VXLAN implementation.

Conclusion:

In your multi-site EVPN/VXLAN setup, ARP suppression is mandatory for inter-site Layer 2 connectivity because it eliminates the dependency on BUM traffic for ARP resolution. Without ARP suppression, ARP broadcasts may not be properly forwarded across sites due to the limitations of ingress replication. By enabling ARP suppression and ensuring the EVPN control plane is functioning correctly, you can achieve seamless Layer 2 connectivity between sites.

 

Hope This Helps!!!

AshSe

Forum Tips: 

  1. Insert photos/images inline - don't attach.
  2. Always mark helpful and correct answers, it helps others find what they need.
  3. For a prompt reply, kindly tag @name. An email will be automatically sent to the member.