02-03-2020 12:07 AM
I have ip flapping issue in cisco ACI environment
as the topology:
I found that when icmp reply from 168.1.37.129 to 168.1.37.45,these icmp reply packets will be sent to SW13 and SW14 at the same time,the icmp reply packets which sent to SW13 with S-IP:168.1.37.129 and S-MAC:d9bc,other icmp reply packets which sent to SW14 with S-IP:168.1.37.129 and S-MAC:66ec,in other words,this is “ip flapping" issue.
problem:
in this case,when 168.1.37.45 ping 168.1.37.129 without interruption,i found 168.1.37.45 can receive icmp reply packets from 168.1.37.129 without interruption,but more than 10 minutes later it can not receive icmp reply packets suddenly,show endpoints command on APIC list 168.1.37.129 associate mac is d9bc;after a few minutes 168.1.37.45 can receive icmp reply packets again and show endpoints command on apic list 168.1.37.129 associate mac is 66ec.
Solution:
enable arp flooding
i think this feature can resolve this problem,but it is not root cause for this probelm.
questions:
1.I want to know the root cause for 168.1.37.45 can not receive icmp reply packet suddenly.
2.this is NIC Teaming active/active without vPC config at server side,so it can cause ip flapping.Does this phenomenon have anything to do with "endpoint loop protection or rogue endpoint control"?
3.if there have others method to resolve this problem without enable arp flooding?
02-05-2020 06:48 PM
This is a typical scenario with ACI learning behavior when A/A without LACP is configured. The recommended approach is to use LACP. You can try to see if this issue is still there by using the following command from the leaf switch:
zcat /mnt/ifc/log/epmc-trace* | grep moved | grep "<YOUR MAC>"
If you have "rogue endpoint" detection enabled, when severe flap occurs, ACI will prevent learning for certain amount of period, hence the endpoint will not be available at all. Endpoint flapping will cause packet loss when it's too severe, or even cause EPM/EPMC to crash sometimes.
If you can't reconfigure the server to be LACP or Active/Standby, there are 2 options that I am aware of:
Disable dataplane learning - this is a BD wide setting and will make ACI stop learning endpoints via data-plane, within that specific BD. In 4.0, there is also an option to disable dataplane learning at the VRF level.
Move the gateway outside of the ACI, so we are not learning any endpoints at all.
I also recommend you go through the endpoint learning whitepaper
02-07-2020 04:33 AM
02-20-2020 11:16 AM
02-20-2020 11:34 AM
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: