I am still learning my way through the ACI, just want some help with regard to EPG learning IP addresses outside it's BD. Our ACI setup is 1 subnet per BD and 1 BD is associated with 1 EPG. I was migrating a host physically from our old network to ACI yesterday, for this particular host, I created a new BD and EPG for it. For the BD, I believe I might have left the "Limit IP learning to subnet" unchecked before I added in the subnet details and enabled unicast routing.
The migrated host (private address) was pingable for a short time after the migration and then went offline, what I also noticed at the time was hosts in other EPG(therefore BD) also start to lost connectivity (can't ping from the rest of the network). I had a very brief time to troubleshoot and when I went into the EPG, I discovered that particular host's MAC address have a lot of random IP against it, some of them are our public IP addresses (which are in a different physical location and not on ACI).
My first question is that, any idea why the MAC address is somehow associated with so many different random IP addresses that don't relate to this system at all? Is this MAC and IP association same as ARP?
At the time, I have checked the other EPGs that have their subnet and unicast enabled on ACI, it seems that some hosts also lost their previously learnt IP addresses.
Once I rolled back the change by removing the subnet from the BD and the static port binding on the EPG, everything came right. I then tried to do this change again 2 hours later this time with "Limit IP learning to subnet" enabled first, everything seems to work without any issues.
I just learnt this morning that the sys admin was seeing alot of alerts for many other systems losing connectivity during the 1st attempt of that particular host migration.
The APIC is on 2.1(1i), the fabric is currently on 11.2(3c), due to be upgraded next week.
Anytime a leaf receives L3 traffic on a front panel port (interface not connected to spine), it will retain both the source mac and IP address as an endpoint. Multiple IP addresses will be learned on the same mac if we receive traffic from that mac/IP combination. It will be installed as a single endpoint with a single mac and multiple IP addresses.
The IP address will be associated with that mac until either:
a) leaf receives traffic from that IP with a different mac address (also known as IP move)
b) we no longer receive traffic from the mac address and it gets cleared out of the endpoint table
^ Even if that IP stops sending traffic, we will still retain that IP as an endpoint as long as the mac still sends traffic
There is not enough evidence to know exactly what happened, but it does sounds like traffic entered the leaf with many different combinations of that mac and many IPs. Enabling the 'Limit IP learning to subnet' feature may relieve the symptoms, but in your case it sounds there is still an underlying issue.
If possible, it would be a good idea to isolate the issue. You could try disabling the limit IP learning feature and troubleshooting from there (call TAC if necessary).
A couple of things to consider:
1. Ensure there are no endpoint flaps happening (for example, IP is not constantly bouncing between two different macs)
2. Review your design and look for areas with potential loops
Thanks for your detail explanation, so basically any time the leaf port receives a packet, the source MAC and source IP will be recorded down as an End Point. And if there are many packets using the same source MAC but different source IP, this will still be recorded as one End Point, am I right?
On that particular BD, there's only one subnet and EPG associated to it, with one voice router (that have one IP interface on it), this router leads no where and it's gateway is the ACI. All traffic from that router will came from the same physical interface with the same private IP.
I took a further look just before at EPG -> History -> Events and I can see all sorts of entries were created at the time of the issue, I am seeing 100s if not more, from IP that belongs to the user access network to public IP addresses that we do not own. I don't know how they associate those IP with the same MAC but certainly it's very weird.
I will go ahead and log a case with TAC and see what they say, because even if I didn't enable "Limit IP Learning to Subnet" initially, I still wouldn't have thought the fabric will make those association, if all they are looking at are source IP address. I will share my findings once i know more.
No official conclusion. I reviewed this thread and based on the symptoms I am confident that an L3 device was plugged into Jackson's ACI inside of an EPG. This could be a router, firewall, load balancer, etc.. IPs which pass through the L3 device and into ACI will be learned with the mac address of the L3 device due to the data plane learning that ACI does on the EPG/BD.
Anytime an L3 device is connected to ACI via EPG or L2 out, then you must do either of the following configurations on the bridge domain(s) connected to the L3 device:
A. Disable unicast routing so that the BD does not learn IP's in the first place
B. If it is required for the BD to route traffic (unicast routining enabled), then you must also limit the IP learning to the BD subnet. There is a checkbox for this under the bridge domain settings.
When IP addresses from other subnets (outside of the bridge domain) go to a routed load balancer/firewall then back into ACI, then ACI will ignore the IP learning of those outside subnets (as long as the checkbox is enabled on the BD). This is a common design/configuration issue and is easily overlooked.
I would recommend looking through service graph guide below. The guide contains great information about BD/VRF considerations when plugging in a routed device into your ACI fabric (regardless if you're doing service graph or not).
Hi Nathan and Jason,
I did end up logging a TAC case but unfortunately the logs on ACI rolled over so the TAC wasn't able to find out the root cause.
It was indeed a 3825 router attached to that particular EPG, however that is a voice router for internal communication and I am certain ACI will only see 1 IP and 1 MAC coming from that router. It is not routing traffic for anything else. So that router is more like a pc/server.
For whatever reason, ACI was mapping a lot of different private and public IP against that particular MAC address of the 3825. Even public addresses that are not owned by the company that I was working for. Once I enabled "limit IP learning to subnet", everything goes away.
Sorry i couldn't be more helpful.
Hello Jackson, our company are experiencing the same issue, we thought it was an issue with our VoIP gateway routers due to this bug “CSCto02712”
CISCO IOS BUG on Voice Router:
Symptoms: A router that is running Cisco IOS Release 15.1(4)M1 with “proxy-arp” enabled will incorrectly reply to duplicate address detection ARP requests sourced from end devices.
Some end devices will send an ARP request for their assigned IP to check for duplicate address detection per RFC5227. When this occurs the router should ignore this ARP request. With this issue, the router will respond to the request, which triggers the duplicate address detection on the end device and breaks connectivity between the router and end device.
Conditions: The symptom is observed with the following conditions:
– “proxy-arp” is enabled on client facing Layer-3 interface.
– end device sends a “duplicate address detection” ARP request on its local subnet.
Workaround 1: Configure no ip proxy arp on client-facing interface.
Workaround 2: Disable “duplicate address detection” on the end device.
So we changed both of our 3k VoIP routers and we are still seeing the same issue, working with TAC for see next steps,
Have you gotten anything back from Cisco? We observed the same thing with a MAC claiming a bunch of addresses which it could have not possibly known. This caused a bunch of flapping which took down a couple of leafs. We have a case open, but the logs were wrapped by the time we got something.
After discussions with Cisco, we have come to the conclusion to try this, move our VoIP gateways from the current Bridge Domain which is a “/23” grouping/subnet, create a new Bridge Domain with the App Profile, and End point group, but make that segment a “/31” grouping/subnet, and enable “limit IP learning” in that domain.
Our thinking is putting those two routers on a smaller segment that won't have a large ARP table.
As Jason mentioned in his initial post, ACI with Unicast Routing enabled will learn endpoints via the leaf front panel ports and create EP entries with the MAC as a Key, and IPs as values. If we are dealing with a scenario in which ACI is reporting to either have:
this is usually due to some behavior or configuration of the connected endpoint. Under no normal circumstance will ACI "inject" IPs into MACs for an endpoint without first having some live traffic with the matching Source IP/Source MAC to act on.
With that said, some end device configs/scenarios we have found that produce this type of "Many IPs to one MAC" or "MAC flapping" behavior are as follows:
Please note that this is but a subset of behaviors seen, and in all cases stem from actual traffic being sourced from thedevices with the MACs/IPs in question.
I'm in the case where:
The faults are located on a Leaf where 5 vlan are in trunk via L2Stretched. I have configured L3OUT on the SVI of one of these 5 vlan.
Everything is working fine, but after upgrade to 3.2.3n, I continue to see duplicated IP addresses messages. I'm sure that there is no real duplication.
TAC will been involved soon. Do you think L3OUT can be the cause?
L3outs do not partake in endpoint learning in the manner that EPGs with static bindings do. At first thought, I would not think your l3out config is having bearing on IP movement.
If you are saying that you have hits when running "less /var/log/dme/log/epmc-trace.txt | grep -i moved", then you should be able to grab the sclass of the epgs as well as the MACs that the IPs are moving between. In most cases, this should map to standard EPGs as opposed to l3outs.
TAC should definitely be able to help clarify the source of dup IPs.
thanks for your fast reply. To confirm your opinion we first have configured L3OUT on a dedicated routed ported, and then we have powered off L3OUT.
In both cases the MAC movements was still present. I'm waiting for TAC help, but I start to think that duplication was already present on legacy network and ACI with new release is showing it.