08-03-2023 03:33 AM
08-07-2023 09:59 PM - edited 08-07-2023 10:01 PM
Hi @Gil Mery ,
I have some questions regarding the way ACI learns endpoints. I have read https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-739989.html and ran some tests to try and confirm that what I expect to happen in ACI actually happens and my understanding is correct.APIC versions are 5.2(3g)Leaves + Spines versions are 15.2(3g)Here is the test:I have 2 Servers connected by vpc (this is just what I had to work with) to 2 different pairs of leaves.Each server has an LACP bond associated with an EPG and BD that I tested the configuration changes I am about to describe here. Both servers are obviously configured with an IP on the same subnet.
Before each test scenario I turned off the bond interface and waited until the server's MAC was gone from the Endpoints in BD tab in the APIC and then I ran the next test. I configured a custom endpoint retention policy for the BD that remained the same at each test scenario.Endpoint Retention Policy:Hold Interval: 5Bounce Entry Aging: 180Local Endpoint Aging Interval: 150Remote Endpoint Aging Interval: 120Move Frequency: 256L3 Unicast Routing is enabled throughout all tests.Test Scenario 1:L2 Unknown Unicast=Hardware ProxyArp Flooding=DisabledNo BD Subnet configuredPing between 2 servers doesn't work.This makes sense, when server A pings server B server A doesn't know server B's MAC address so it sends an ARP message. leaf A receives the ARP request, since ARP flooding is disabled the leaf tries to send the ARP via unicast to server B's address. leaf A doesn't know IP B so it sends the packet to the spine proxy. The spine proxy doesn't know the IP of server B and tries to flood an ARP message to the BD (ARP Gleaning) since there is no SVI (no BD subnet configured) in the BD the packet is discarded, ARP doesn't work hence ping doesn't work.Can you confirm I described the process correctly?
Test Scenario 2:L2 Unknown Unicast=Hardware ProxyArp Flooding=DisabledBD Subnet is configuredIP Data-Plane Learning=EnabledPing between 2 servers works.Again this makes sense, the same process happens as test scenario 1 but this time the BD has an SVI and ARP Gleaning works. When Server A sent an ARP request for server B thanks to data plane learning ACI learned Server A's IP and when Server B sent a unicast ARP reply back to server A the spine proxy has a record of server A's endpoint.Am I correct by assuming that is why this time ping works?
Test Scenario 3:L2 Unknown Unicast=Hardware ProxyArp Flooding=EnabledNo BD Subnet configuredPing between 2 servers works.Again this makes sense, the same process happens as test scenario 1 but this time leaf A isn't trying to perform a unicast ARP message rather it floods the ARP in the BD and as a result server B responds.Again I would love confirmation.
Test Scenario 4:L2 Unknown Unicast=Hardware ProxyArp Flooding=DisabledBD Subnet is configuredIP Data-Plane Learning=DisabledPing between 2 servers works.This doesn't make sense to me. The documentation I listed at the top explicitly states that when IP Data-Plane learning is disabled L2 Unknown Unicast needs to be set to Flood and ARP flooding must be enabled. Remember what I mentioned in Test Scenario 2 the reason that Server B's ARP reply unicast knows how to reach Server A is because the Spine Proxy was able to learn Server A's endpoint thanks to data plane learning when Server A originated its ARP.If anyone has an explanation to this I would love to hear.
OK. I've finally worked out that your question is about the behaviour of ACI when IP Data-Plane learning is disabled and ARP Flooding is disabled (against the advice of you reference document).
For the record - that document states:
When the IP Data-plane Learning option is disabled, endpoint learning behavior on an ACI leaf changes as follows:
● Local MACs and remote MACs are learned via the data plane (no change with this option).
● Local IPs are not learned via the data plane.
● Local IPs are learned from ARP/GARP/ND via the control plane.
● Remote IPs are not learned from unicast packets via the data plane.
● Remote IPs are learned from multicast packets via the data plane.
So let's see what should happen with your two servers in Test Scenario 4:
[Caveat: I have not actually TESTED this, I'm just writing my logic as I go as to what *I* think should happen]
Assumption: No MAC addresses of the relevant servers have been learned by any switch
Scenario: Server A pings Server B. As you stated, A & B are on the same subnet.
I hope this helps.
08-09-2023 01:17 AM
Hey @RedNectar thank you very much for taking the time reading through my questions and providing such a detailed response.
I'll start by saying that yes, my main question here is whether or not I should set L2 Unknown Unicast to Flood instead of Hardware Proxy when IP-Data Plane learning is disabled.
As for what you think is happening internally in ACI in test scenario 4. There is a section in https://community.cisco.com/t5/application-centric-infrastructure/understanding-endpoint-learning-in-aci/m-p/4898143#M14408 explaining exactly why L2 Unknown Unicast should be set to Flood but as I mentioned I tried running a test and things seemed to be working fine even with Hardware Proxy. However, only now I noticed that the local MAC addresses are actually learned before I send out a ping (even after I turned off the vPC connection and waited for them to clear out), meaning that unlike what I thought the hosts weren't silent.
I turned off a few services on the host that I saw were generating traffic (this is not a new server I brought in for the test) and after I did that and tried to send a ping between the 2 servers with ip-data plane learning disabled and hardware proxy the ping didn't work just like the documentation states. When I changed hardware proxy to flood the ping worked, just like the documentation states.
Now since you took the time to describe in detailed the process in scenario 4, I'd like to return the favor and reply with what I think (according to the documentation) happens.
- I believe your step 2 is incorrect. Leaves 1 and 2 don't learn the IP address of Server A. Remeber data-plane learning is disabled only control plane traffic is used to learn endpoints. I am not sure why that is so but ARP requests don't trigger endpoint learning, only ARP replies if anyone could shed some light on this it would be great, but this is a very consistent behavior throughout Cisco's documentation. Now, according to the whitepaper I linked the MAC address of Server A isn't learned as well because IP-Data Plane Learning is disabled, I don't know why that is so, even with ip-data plane learning disabled MAC addresses should be learned via the data plane. Again if anyone has an explanation it would be great to hear.
- ARP Flooding is in fact enabled, so ARP gleaning doesn't occur, rather the ARP request is flooded in the network. So the original ARP request from Server A reaches Server B. Here again according to the documentation Leaf 3 and 4 don't learn MAC A and I'm again confused as to why disabling IP-Data Plane Learning prevents the leaf from learning MAC addresses via the data plane.
- Server B returns a Unicast ARP reply (leaf 3 and 4 learn IP and MAC of B), which reaches the Spine Proxy that has no recollection of A's MAC or IP. The reply is dropped, ARP doesn't work and ping doesn't work.
Does this makes sense to you? @RedNectar It does make sense to me with a number of confusions I raised during my response. I am more calm now however than when I was when I opened this thread since the behavior described in the documentation is exactly what happens in my lab.
08-10-2023 12:33 AM
Hi @Gil Mery ,
You are absolutely right about my Step 2 above. But it took me a while to figure it out - and I recorded my ramblings whilst I was trying to find out.
WARNING - Nobody wants to listen to ALL of this, it's pretty much uneditted and raw. Listen at your own risk.
Part 1 shows me trying to justify why my explanation was correct. But I missed seeing a crucial packet at about 8:02 - where I highlighted an ARP request which was actually an ARP Glean for my Server A (LH Side server). Emarrassingly, I recognised it as an ARP glean, but didn't realise it was gleaning Server A's IP - not Server B. AT about 9:57 you can hear that I begin to doubt myself (with good reason - and you'll see I resolve that in Part 3)
Part 2 is just me embarrassing myself as I try to show that the leaf will learn a MAC and IP from an ARP request. And failing. You probably don't want to watch this one!
Part 3 shows me repeating Part 1 really - the key point is the screendump I've inserted into the text below as I edit my original logic.
Note any additions I've made today are in bold orange, but I think I've now got the process sorted.
Scenario: Server A pings Server B. As you stated, A & B are on the same subnet.
Now there's still a problem. @Gil Mery says that his Scenario 4 did NOT give ping replies. And mine did. So there is still a mystery! But enough time wated today.
08-10-2023 02:05 AM
Hey @RedNectar again thank you for your time and the willingness to help.
I want to be really focused when replying to you. As I'm pretty swamped at work it might take me a few days for me to reply. Also a heads up my lab is air gapped, so it might be difficult to extract pictures and so on but I'm still hoping to provide you with a quality response.
09-01-2023 11:46 PM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide