Re: Understanding Endpoint Learning in ACI

Gil Mery · ‎08-03-2023

Hey Cisco,

I have some questions regarding the way ACI learns endpoints. I have read https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-739989.html and ran some tests to try and confirm that what I expect to happen in ACI actually happens and my understanding is correct.

APIC versions are 5.2(3g)

Leaves + Spines versions are 15.2(3g)

Here is the test:

I have 2 Servers connected by vpc (this is just what I had to work with) to 2 different pairs of leaves.

Each server has an LACP bond associated with an EPG and BD that I tested the configuration changes I am about to describe here. Both servers are obviously configured with an IP on the same subnet.

Before each test scenario I turned off the bond interface and waited until the server's MAC was gone from the Endpoints in BD tab in the APIC and then I ran the next test. I configured a custom endpoint retention policy for the BD that remained the same at each test scenario.

Endpoint Retention Policy:

Hold Interval: 5

Bounce Entry Aging: 180

Local Endpoint Aging Interval: 150

Remote Endpoint Aging Interval: 120

Move Frequency: 256

L3 Unicast Routing is enabled throughout all tests.

Test Scenario 1:

L2 Unknown Unicast=Hardware Proxy

Arp Flooding=Disabled

No BD Subnet configured

Ping between 2 servers doesn't work.

This makes sense, when server A pings server B server A doesn't know server B's MAC address so it sends an ARP message. leaf A receives the ARP request, since ARP flooding is disabled the leaf tries to send the ARP via unicast to server B's address. leaf A doesn't know IP B so it sends the packet to the spine proxy. The spine proxy doesn't know the IP of server B and tries to flood an ARP message to the BD (ARP Gleaning) since there is no SVI (no BD subnet configured) in the BD the packet is discarded, ARP doesn't work hence ping doesn't work.

Can you confirm I described the process correctly?

Test Scenario 2:

L2 Unknown Unicast=Hardware Proxy

Arp Flooding=Disabled

BD Subnet is configured

IP Data-Plane Learning=Enabled

Ping between 2 servers works.

Again this makes sense, the same process happens as test scenario 1 but this time the BD has an SVI and ARP Gleaning works. When Server A sent an ARP request for server B thanks to data plane learning ACI learned Server A's IP and when Server B sent a unicast ARP reply back to server A the spine proxy has a record of server A's endpoint.

Am I correct by assuming that is why this time ping works?

Test Scenario 3:

L2 Unknown Unicast=Hardware Proxy

Arp Flooding=Enabled

No BD Subnet configured

Ping between 2 servers works.

Again this makes sense, the same process happens as test scenario 1 but this time leaf A isn't trying to perform a unicast ARP message rather it floods the ARP in the BD and as a result server B responds.

Again I would love confirmation.

Test Scenario 4:

L2 Unknown Unicast=Hardware Proxy

Arp Flooding=Disabled

BD Subnet is configured

IP Data-Plane Learning=Disabled

Ping between 2 servers works.

This doesn't make sense to me. The documentation I listed at the top explicitly states that when IP Data-Plane learning is disabled L2 Unknown Unicast needs to be set to Flood and ARP flooding must be enabled. Remember what I mentioned in Test Scenario 2 the reason that Server B's ARP reply unicast knows how to reach Server A is because the Spine Proxy was able to learn Server A's endpoint thanks to data plane learning when Server A originated its ARP.

If anyone has an explanation to this I would love to hear.

RedNectar · ‎08-07-2023

Hi @Gil Mery ,

I have some questions regarding the way ACI learns endpoints. I have read https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-739989.html and ran some tests to try and confirm that what I expect to happen in ACI actually happens and my understanding is correct.

APIC versions are 5.2(3g)

Leaves + Spines versions are 15.2(3g)

Here is the test:

I have 2 Servers connected by vpc (this is just what I had to work with) to 2 different pairs of leaves.

Each server has an LACP bond associated with an EPG and BD that I tested the configuration changes I am about to describe here. Both servers are obviously configured with an IP on the same subnet.

OK. I can't work without a picture (which is why this question has sat in my inbox for a week before getting around to answering it)

Before each test scenario I turned off the bond interface and waited until the server's MAC was gone from the Endpoints in BD tab in the APIC and then I ran the next test. I configured a custom endpoint retention policy for the BD that remained the same at each test scenario.

Endpoint Retention Policy:

Hold Interval: 5

Bounce Entry Aging: 180

Local Endpoint Aging Interval: 150

Remote Endpoint Aging Interval: 120

Move Frequency: 256

L3 Unicast Routing is enabled throughout all tests.

Test Scenario 1:

L2 Unknown Unicast=Hardware Proxy

Arp Flooding=Disabled

No BD Subnet configured

Ping between 2 servers doesn't work.

This makes sense, when server A pings server B server A doesn't know server B's MAC address so it sends an ARP message. leaf A receives the ARP request, since ARP flooding is disabled the leaf tries to send the ARP via unicast to server B's address. leaf A doesn't know IP B so it sends the packet to the spine proxy. The spine proxy doesn't know the IP of server B and tries to flood an ARP message to the BD (ARP Gleaning) since there is no SVI (no BD subnet configured) in the BD the packet is discarded, ARP doesn't work hence ping doesn't work.

Can you confirm I described the process correctly?

Correct. Although you may as well say Unicast routing is disabled, because you have no subnets configured.

Test Scenario 2:

L2 Unknown Unicast=Hardware Proxy

Arp Flooding=Disabled

BD Subnet is configured

IP Data-Plane Learning=Enabled

Ping between 2 servers works.

Again this makes sense, the same process happens as test scenario 1 but this time the BD has an SVI and ARP Gleaning works. When Server A sent an ARP request for server B thanks to data plane learning ACI learned Server A's IP and when Server B sent a unicast ARP reply back to server A the spine proxy has a record of server A's endpoint.

Am I correct by assuming that is why this time ping works?

Yes Correct again

Test Scenario 3:

L2 Unknown Unicast=Hardware Proxy

Arp Flooding=Enabled

No BD Subnet configured

Ping between 2 servers works.

Again this makes sense, the same process happens as test scenario 1 but this time leaf A isn't trying to perform a unicast ARP message rather it floods the ARP in the BD and as a result server B responds.

Again I would love confirmation.

Yes Correct again. Spot on.

Test Scenario 4:

L2 Unknown Unicast=Hardware Proxy

Arp Flooding=Disabled

BD Subnet is configured

IP Data-Plane Learning=Disabled

Ping between 2 servers works.

This doesn't make sense to me. The documentation I listed at the top explicitly states that when IP Data-Plane learning is disabled L2 Unknown Unicast needs to be set to Flood and ARP flooding must be enabled. Remember what I mentioned in Test Scenario 2 the reason that Server B's ARP reply unicast knows how to reach Server A is because the Spine Proxy was able to learn Server A's endpoint thanks to data plane learning when Server A originated its ARP.

If anyone has an explanation to this I would love to hear.

OK. I've finally worked out that your question is about the behaviour of ACI when IP Data-Plane learning is disabled and ARP Flooding is disabled (against the advice of you reference document).

For the record - that document states:

When the IP Data-plane Learning option is disabled, endpoint learning behavior on an ACI leaf changes as follows:

● Local MACs and remote MACs are learned via the data plane (no change with this option).

● Local IPs are not learned via the data plane.

● Local IPs are learned from ARP/GARP/ND via the control plane.

● Remote IPs are not learned from unicast packets via the data plane.

● Remote IPs are learned from multicast packets via the data plane.

So let's see what should happen with your two servers in Test Scenario 4:

[Caveat: I have not actually TESTED this, I'm just writing my logic as I go as to what *I* think should happen]

Assumption: No MAC addresses of the relevant servers have been learned by any switch

Scenario: Server A pings Server B. As you stated, A & B are on the same subnet.

Server A sends an ARP request for Server B
The ARP reaches Leaf 1 or 2 (doesn't matter) and both leaves learn the MAC and IP of Server A
The MAC & IP addresses of server A are reported to the Spine Proxy
The ingress leaf switch has no knowledge of B's IP address, so will send the ARP request to the spine Proxy.
The Spine proxy has no knowledge of B's IP address, so
1. caches the original ARP from A for later (see step 10)
2. sends an ARP Glean to all switches in the BD
All switches will send an ARP request (seeking B's MAC) from the BD SVI address
Server B will receive the ARP and respond
The ARP reply reaches Leaf 3 or 4 (doesn't matter) and both leaves learn the MAC and IP of Server B
The MAC & IP addresses of server B are reported to the Spine Proxy
Meanwhile, the spine proxy still has the original ARP request from Server A seeking Server B's MAC (or maybe the first ARP has timed out and it now has a 2nd ARP request. No matter, same story). See 5.1 above
Since the Spine Proxy now knows that B's IP is on VPC2 TEP, it sends that original ARP request form A seeking B's MAC to VPC2 TEP
Either Leaf 3 or 4 gets the original ARP request form A seeking B's MAC, BUT
1. because IP Data-Plane Learning=Disabled, Leaf 3/4 does NOT learn the IP address of Server A - just the MAC address
The original ARP request is forwarded to Server B (remember, leaves 3 & 4 learned Server B's IP and MAC in step
Server B sends a unicast ARP reply to A's MAC address
The ingress leaf switch has no knowledge of A's MAC (or IP) address, so will send the ARP reply to the spine Proxy.
The spine Proxy knows where MAC A is, so sends the reply to the VPC1 VTEP address on leaf1 & 2 - one of these two switches gets the packet and sends it on to Server A - BUT
1. because IP Data-Plane Learning=Disabled, Leaf 1/2 does NOT learn the IP address of Server B - just the MAC address
Server A sends an ICMP echo to MAC B
The ingress leaf knows where B' MAC lives, so forwards the L2 unicast frame to the VPC2 VTEP address.
Either switch 3 or 4 gets the L2 unicast frame and sends it to Sever B
Server B sends an ICSP echo reply to MAC A
The ingress leaf knows where A' MAC lives, so forwards the L2 unicast frame to the VPC1 VTEP address.
Either switch 1 or 2 gets the L2 unicast frame and sends it to Sever A
Rinse and repeat from step 17 for each ping

I hope this helps.

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

Gil Mery · ‎08-09-2023

Hey @RedNectar thank you very much for taking the time reading through my questions and providing such a detailed response.

I'll start by saying that yes, my main question here is whether or not I should set L2 Unknown Unicast to Flood instead of Hardware Proxy when IP-Data Plane learning is disabled.

As for what you think is happening internally in ACI in test scenario 4. There is a section in https://community.cisco.com/t5/application-centric-infrastructure/understanding-endpoint-learning-in-aci/m-p/4898143#M14408 explaining exactly why L2 Unknown Unicast should be set to Flood but as I mentioned I tried running a test and things seemed to be working fine even with Hardware Proxy. However, only now I noticed that the local MAC addresses are actually learned before I send out a ping (even after I turned off the vPC connection and waited for them to clear out), meaning that unlike what I thought the hosts weren't silent.

I turned off a few services on the host that I saw were generating traffic (this is not a new server I brought in for the test) and after I did that and tried to send a ping between the 2 servers with ip-data plane learning disabled and hardware proxy the ping didn't work just like the documentation states. When I changed hardware proxy to flood the ping worked, just like the documentation states.

Now since you took the time to describe in detailed the process in scenario 4, I'd like to return the favor and reply with what I think (according to the documentation) happens.

- I believe your step 2 is incorrect. Leaves 1 and 2 don't learn the IP address of Server A. Remeber data-plane learning is disabled only control plane traffic is used to learn endpoints. I am not sure why that is so but ARP requests don't trigger endpoint learning, only ARP replies if anyone could shed some light on this it would be great, but this is a very consistent behavior throughout Cisco's documentation. Now, according to the whitepaper I linked the MAC address of Server A isn't learned as well because IP-Data Plane Learning is disabled, I don't know why that is so, even with ip-data plane learning disabled MAC addresses should be learned via the data plane. Again if anyone has an explanation it would be great to hear.

- ARP Flooding is in fact enabled, so ARP gleaning doesn't occur, rather the ARP request is flooded in the network. So the original ARP request from Server A reaches Server B. Here again according to the documentation Leaf 3 and 4 don't learn MAC A and I'm again confused as to why disabling IP-Data Plane Learning prevents the leaf from learning MAC addresses via the data plane.

- Server B returns a Unicast ARP reply (leaf 3 and 4 learn IP and MAC of B), which reaches the Spine Proxy that has no recollection of A's MAC or IP. The reply is dropped, ARP doesn't work and ping doesn't work.

Does this makes sense to you? @RedNectar It does make sense to me with a number of confusions I raised during my response. I am more calm now however than when I was when I opened this thread since the behavior described in the documentation is exactly what happens in my lab.

RedNectar · ‎08-10-2023

Hi @Gil Mery ,

You are absolutely right about my Step 2 above. But it took me a while to figure it out - and I recorded my ramblings whilst I was trying to find out.

WARNING - Nobody wants to listen to ALL of this, it's pretty much uneditted and raw. Listen at your own risk.

Part 1 shows me trying to justify why my explanation was correct. But I missed seeing a crucial packet at about 8:02 - where I highlighted an ARP request which was actually an ARP Glean for my Server A (LH Side server). Emarrassingly, I recognised it as an ARP glean, but didn't realise it was gleaning Server A's IP - not Server B. AT about 9:57 you can hear that I begin to doubt myself (with good reason - and you'll see I resolve that in Part 3)

(view in My Videos)

Part 2 is just me embarrassing myself as I try to show that the leaf will learn a MAC and IP from an ARP request. And failing. You probably don't want to watch this one!

(view in My Videos)

Part 3 shows me repeating Part 1 really - the key point is the screendump I've inserted into the text below as I edit my original logic.

(view in My Videos)

Note any additions I've made today are in bold orange, but I think I've now got the process sorted.

Scenario: Server A pings Server B. As you stated, A & B are on the same subnet.

Server A sends an ARP request for Server B
The ARP reaches Leaf 1 or 2 (doesn't matter) ~~and both leaves learn the MAC and IP of Server A~~
~~The MAC & IP addresses of server A are reported to the Spine Proxy~~
The ingress leaf switch has no knowledge of B's IP address, so will send the ARP request to the spine Proxy.
The Spine proxy has no knowledge of B's IP address, so
1. caches the original ARP from A for later (see step 10)
2. sends an ARP Glean to all switches in the BD
All switches will send an ARP request (seeking B's MAC) from the BD SVI address
Server B will receive the ARP and respond
The ARP reply reaches Leaf 3 or 4 (doesn't matter) and both leaves learn the MAC and IP of Server B
- This seems to be correct - the ARP reply is considered Control Plane - not data plane
The MAC & IP addresses of server B are reported to the Spine Proxy
- This seems to be correct
Meanwhile, the spine proxy still has the original ARP request from Server A seeking Server B's MAC (or maybe the first ARP has timed out and it now has a 2nd ARP request. No matter, same story). See 5.1 above
Since the Spine Proxy now knows that B's IP is on VPC2 TEP, it sends that original ARP request form A seeking B's MAC to VPC2 TEP
Either Leaf 3 or 4 gets the original ARP request form A seeking B's MAC, BUT
1. because IP Data-Plane Learning=Disabled, Leaf 3/4 does NOT learn the IP address of Server A - just the MAC address
The original ARP request is forwarded to Server B (remember, leaves 3 & 4 learned Server B's IP and MAC in step
Server B sends a unicast ARP reply to A's MAC address
The ingress leaf switch has no knowledge of A's MAC (or IP) address, so will send the ARP reply to the spine Proxy.
~~The spine Proxy knows where MAC A is, so sends the reply to the VPC1 VTEP address on leaf1 & 2 - one of these two switches gets the packet and sends it on to Server A -~~
- That's not what happens. Here's the connect story. The Spine Proxy does not know where A's IP or MAC is, so sends an ARP glean lookimg for the IP of server A. Here's a picture from my video#3 showing where I missed this happening. (The same ARP appears on both my Endpoints
- And as I said before: because IP Data-Plane Learning=Disabled, Leaf 1/2 does NOT learn the IP address of Server B - just the MAC address
Server A sends an ICMP echo to MAC B
The ingress leaf knows where B' MAC lives, so forwards the L2 unicast frame to the VPC2 VTEP address.
Either switch 3 or 4 gets the L2 unicast frame and sends it to Sever B
Server B sends an ICSP echo reply to MAC A
The ingress leaf knows where A' MAC lives, so forwards the L2 unicast frame to the VPC1 VTEP address.
Either switch 1 or 2 gets the L2 unicast frame and sends it to Sever A
Rinse and repeat from step 17 for each ping

Now there's still a problem. @Gil Mery says that his Scenario 4 did NOT give ping replies. And mine did. So there is still a mystery! But enough time wated today.

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

Gil Mery · ‎08-10-2023

Hey @RedNectar again thank you for your time and the willingness to help.

I want to be really focused when replying to you. As I'm pretty swamped at work it might take me a few days for me to reply. Also a heads up my lab is air gapped, so it might be difficult to extract pictures and so on but I'm still hoping to provide you with a quality response.

RedNectar · ‎09-01-2023

Hi @Gil Mery ,

Just wondering - did you ever get a chance to work this one out?

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.