07-30-2018 03:46 PM - edited 03-01-2019 05:36 AM
ACI has long been touted as a system where ARP requests are minimised, but I did some testing recently when exploring the ARP Gleaning feature (see BRKACI-3101 if you don't know what ARP Gleaning is) and found that in the case of ARP requests for non-existent IPs, ACI multiplies the number of ARPs sent by up to a factor of 4, plus add additional ARP carrying VLAN tags that could leak onto customer networks and potentially cause havoc.
Here is my setup:
I have four servers (Web Servers) in the same Bridge Domain and the same EPG. ARP flooding is Disabled on the BD, and the Subnet IP of the BD of 192.168.92.1 serves as the default gateway for each host.
A picture always helps.
Now, as I said, this was a little experiment to demonstrate how ARP Gleaning worked, so I sent a single ping from each of the four workstations above to a non-existent IP (192.168.92.99). In each case, this single ping generated 3 ARP requests from the originating host (Lubuntu) and I ran Wireshark captures on all four workstations. I've included my raw results below.
Findings were exactly the same for all three scenarios listed above (where the Bare Metal Host was attached first as Access (802.1P), then as Access (Untagged) then as Trunk)
These results beg a few questions:
Does anyone have any answers?
Here are my raw results
Result:
VM Attached to Leaf 101 – the one sending the ARPs sees:
VM Attached to Leaf 102 sees:
BMH Attached to Leaf 102 sees:
Host receiving traffic via VPC Attached to Leaf101+102 sees:
Result:
VM VM Attached to Leaf 102 – the one sending the ARPs sees:
VM Attached to Leaf 101 sees:
BMH Attached to Leaf 102 sees:
Host receiving traffic via VPC Attached to Leaf101+102 sees
Result:
BMH Attached to Leaf 102 – the one sending the ARPs sees:
VM Attached to Leaf 101 sees:
VM Attached to Leaf 102 sees:
Host receiving traffic via VPC Attached to Leaf101+102 sees
Result:
Host attached via switch via VPC attached to Leaf101+102
BMH Attached to Leaf 102 sees:
VM Attached to Leaf 101 sees:
VM Attached to Leaf 102 sees:
The results were exactly the same as 802.1P encapsulation
The results were exactly the same with the following exceptions:
Screendumps for Test#1
Screendumps for Test#2
Screendumps for Test#3
Screendumps for Test#4
Solved! Go to Solution.
08-10-2018 06:45 PM
I am running 3.0(2k) not 3.0(1k). Sorry about that.
Anyway, I tried to duplicate your problem in my lab today, but as it turns out, someone put the 1st-gen leaf switches that were in the lab into production! So I was unable to test the 1st-gen theory. For now, anyway.
Good news, I suppose: Running 3.0(2k), using 93180YC-EX switches, I did NOT see the same behavior out to a directly-attached host. I tried in all three modes (trunk, dot1p, and untagged). I also cleared endpoints after each change, just to be sure. In my case, I used encap vlan-500. The cvid on the leaf switch was not 500. I forgot exactly what it was, but what's important is that it wasn't 500.
In all three situations, I only saw the ARP requests from the default gateway being broadcast, as expected. I did not see any unicast ARP messages, nor did I see anything from any address other than the gateway.
When trunked, I saw the above frames tagged with the expected VID: 500. I did not see anything tagged with the cvid, nor did I see anything untagged.
In 802.1p and in untagged modes, I saw only untagged frames. Nothing unexpected.
So again, I suppose that is good news. I would like to test on this version with a 1st-gen switch, but I may not be able to do so.
We're staring down the barrel at a fabric upgrade and were asked to upgrade the lab to 3.2. Hopefully that will happen next week. Once that is completed, I will repeat the test. Again, on the 93180YC-EX switches at least, and will let you know what I see.
08-04-2018 06:03 AM - edited 08-10-2018 06:26 PM
Wow! First off, great post, as always. It caught my attention, as I have had a todo for some time to get some packet captures to gain a little more insight into the ARP gleaning process before I start making any recommendations one way or another re: ARP flooding. Basically, to make sure that I understood it before others start asking about it. :)
At first I was thinking that you had a specific corner case, but you identified some behaviors that simply shouldn't be. The kicker at the end was most disturbing, re: the additional cvid-encapsulated ARP request.
My gut tells me that it might be platform-specific. In particular, re: forwarding behavior in 3.2 on the first-generation leaf switches vs. second-generation leaf switches. I'm pretty sure it was the Orlando 2018 BRKACI-3545 where said behavior changes and differences were noted.
I'm curious, do you have a second-gen leaf switch you could use, but leave everything else the same? And / or maybe try this with an older release? 3.0 maybe?
My lab is running 3.0(2k). I have a mix of first- and second-gen leaf switches. I have a 9372PX pair, a 9372PX-E pair, and a 93180YC-EX pair. Unfortunately, I just don't have the time during the week, and this is not a good weekend for me to try to get deep into the muck. I can probably duplicate your setup maybe some time next weekend, though.
Edit: Fixed a typo of my own; Fixed my uh-oh: I'm running 3.0(2k) not (1k).
08-10-2018 06:45 PM
I am running 3.0(2k) not 3.0(1k). Sorry about that.
Anyway, I tried to duplicate your problem in my lab today, but as it turns out, someone put the 1st-gen leaf switches that were in the lab into production! So I was unable to test the 1st-gen theory. For now, anyway.
Good news, I suppose: Running 3.0(2k), using 93180YC-EX switches, I did NOT see the same behavior out to a directly-attached host. I tried in all three modes (trunk, dot1p, and untagged). I also cleared endpoints after each change, just to be sure. In my case, I used encap vlan-500. The cvid on the leaf switch was not 500. I forgot exactly what it was, but what's important is that it wasn't 500.
In all three situations, I only saw the ARP requests from the default gateway being broadcast, as expected. I did not see any unicast ARP messages, nor did I see anything from any address other than the gateway.
When trunked, I saw the above frames tagged with the expected VID: 500. I did not see anything tagged with the cvid, nor did I see anything untagged.
In 802.1p and in untagged modes, I saw only untagged frames. Nothing unexpected.
So again, I suppose that is good news. I would like to test on this version with a 1st-gen switch, but I may not be able to do so.
We're staring down the barrel at a fabric upgrade and were asked to upgrade the lab to 3.2. Hopefully that will happen next week. Once that is completed, I will repeat the test. Again, on the 93180YC-EX switches at least, and will let you know what I see.
08-10-2018 10:30 PM
Thanks for testing on newer hardware! It certainly is good news!
08-17-2018 10:10 AM
Just to follow up, I just did the same test with 3.2(2o). Again, using 93180YC-EX switches. Same results for me as on 3.0(2k). I was really starting to suspect the version more than anything else, but maybe it is, in fact, something with 1st gen vs 2nd gen.
Still trying to get my hands on a 1st gen switch. I really want to try to duplicate your results.
12-02-2018 02:25 PM
Hello Chris,
Can you please help me understand the ARP requests for unknown IP.
apart from the ARP query, i have another one for unknown unicast.
Regards
Nilesh
12-02-2018 03:22 PM
Hello Chris,
I have got the answers from the Cisco Live presentation and video. However you expert feedback is welcomed
BRKACI-3101.ACI.Under.the.Hood.-How.Your.Configuration.is.Deployed
Your blog on ACI arp gleaning is also very good to understand the need of AP flooding as well as the behavior of SPINE in case it misses the sender target IP.
https://rednectar.net/2018/08/13/aci-arp-gleaning/
Thanks
Nilesh
12-03-2018 08:39 PM
Hi Nilesh,
Did you get all the answers you need? That BRKACI-3101 presentation is my all time favourite!
Let me know if you need any more help.
12-06-2018 10:27 AM
Hello Chris,
Yes, all my queries related to ACI fabric forwarding are now answered.
I will now proceed to study advanced ACI topics like - service insertion followed programability
Priority is to complete the topics on DC lab syllabus
Thank You !
Nilesh
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide