Hello Friends!
I have a simple VXLAN EVPN environment with 5672(Leafs) and 7706(BorderSpines).
All Leafs paired in VPC pairs.
All vlans exist on all VPC pairs and same Anycast GW with the same Anycast MAC address configured on them.
So, there is nothing special :).
Recently I noticed that huge amount of problem with host reachability exist in my environment.
Hosts partially lost connectivity (from some places of our network it`s reachable and from another it`s not)
Also I saw that iBGP table version count tremendously big (from my environment size perspective)
After some investigations, I understood that problem arise only on "orphan-connected" hosts.(Yes, orphan hosts isn`t good idea, but unfortunately now we have to maintain this type of connections in our environment)
ARP entries of this orphan hosts are continuously flapping (on Leafs from VPC pair where hosts connected to), entries lives no longer than 1-3 seconds.
MAC table is stable on both VPC members at this moment, all entries pointed to right interfaces.
It cause iBGP updates continuously flapping (Leafs doing withdrawal\advertising EVPN routes according arp table flapping) and increase iBGP table version count to the moon :)
So, the issue flow in the BAD case (from my perspective):
- Orphan-host connected to VPC Peer A
- Peer A complete MAC entry and ARP entry, but for some reason not synchronize ARP entry with Peer B(or partially synchronized)
- When packet to host landing to Peer B it has no complete ARP entry and trying to resolve it by sending ARP request via peer-link
- Request goes to the Peer A and reaches destination host which generate ARP reply
- This ARP reply refreshing ARP entry on Peer A(and absorbing here cause Anycast GW) but for some reason Peer A and Peer B still don`t synchronize ARP entry via CFSoE
- Every subsequent packet that landing to Peer B will reproduce this circle problem and calling ARP flapping
It`s interesting behavior, more likely a bug. This behavior independent of connection type(directly-connected or fex-connected)
Maybe this post will help for someone who faced with similar problem.