Solved: VMM Endpoint not showing as Learned in the Learning Source of the EPGs

bl80 · ‎07-24-2021

This is a new fabric being built. Its nearly one-for-one clone of another datacenter. VCenter integration setup is very standard and everything was working up to the point of actually deploying VMs. I can see the VMs under the port groups in the VMM VDS Port Groups.

When looking at the App Profile > EPG > Operational > Client End-Points the VMs do show there but they do not show their IP address, they only show "vmm" under the Learning Source and do not show the interface they are being learned on.

Looking through the VMware vDS Integration troubleshooting documentation :

https://www.ciscolive.com/c/dam/r/ciscolive/emea/docs/2020/pdf/BRKACI-2645.pdf

I see this behavior and troubleshooting tips on page 44. No faults showing and we have confirmed Port Group Mapping.

Any suggested solutions for this? We have opened a TAC but nothing back from them as of yet. We have gone through ever single configuration step we can compare against on our fully functional fabric. We have reset all ports, have moved to secondary VC, have rebooted every APIC, SPINE and LEAF in the fabric.

I can only assume this is something simple that we have missed.

Appreciate any suggestions.

bl80 · ‎07-26-2021

Robert - thank you for the great info. This issue has been resolved.

This was improperly mapped physical adapters in the Virtual Switch configuration in vSphere. The vmnics assigned were incorrect.

Unfortunately, the setup for that all falls completely out of my realm of knowledge and support ablitity. Learning new stuff all the time I guess.

Fix was to identify the correct MAC addresses noted in the physical adapters list under the host>networking>physical adapters and then map them properly under hosts>networking> virtual switches > migrate networking.

Full operational as intended.

Thanks for the information and suggestions along the way. Truly appreciated.

View solution in original post

Claudia de Luna · ‎07-24-2021

Hi @bl80,

I suspect you have done so but have you sourced traffic from the VMs?

Robert Burns · ‎07-24-2021

If you see only "VMM" and not also "Learned" under the source, it means the fabric has not yet "seen" any traffic on the wire from that endpoint. We only know about the endpoint from the VMM inventory information pushed from VC > APIC. Are you sure there are no faults on the EPG in question? Typically this is a Layer 1-2 issue, so you might have to check the L2 path between the Leaf & Hypervisor. If there are any intermediate device/switches in the path, be sure they have the VLAN assigned by the VMM created and allowed on all interface in the path. If your hypervisor is directly connected to a leaf, you can skip this.

As Claudia suggested, try to source a ping from the VM to the BD SVI (create one if this doesn't yet exist). I'd also try to put another VM on the same host, same port group and see if that can ping the VM locally. This will at least tell you if the VMs connectivity is at least working on the vDS port group of the local host.

Robert

bl80 · ‎07-24-2021

Thanks for the suggestions. We do directly connect to the VC interfaces on the chassis via leaf.

I can confirm that two VMs on the same port group cannot ping each other or their gateway/BD IP.

Something of note -- one of the VMs has been in this particular port group for a day or so, its showing the correct IP. The other VM that I just moved to it in VCenter for this VM-to-VM testing is listed in the port group but showing the old IP address. I have restarted the VM and refreshed everything I can think of. Definitely unexpected behavior compared to the other Fabric we have were all of this integration is instantaneous.

I deleted the VMM association from the App Profile, deleted the Port Group in VCenter and then re-added it all back in. The VMs now show the correct, new IP but same behavior in the learning source and cannot ping each other or gateway.

LLDP is the discovery protocol. The VM host admins have told me they have gone through all the appropriate checks to ensure LLDP configured as needed for this to function. I know there are some considerations that have to be done on the host side. Not clear what I can do from the ACI side to prove this a VCenter issue. Feeling more like this is not a problem in the Fabric ....

RedNectar · ‎07-24-2021

Hi @bl80 ,

The bit where you say "two VMs on the same port group cannot ping each other or their gateway/BD IP", along with "but showing the old IP address" worries me.

And I'd like to see the result of @Robert Burns' suggestion:

"I'd also try to put another VM on the same host, same port group and see if that can ping the VM locally."

Your statement "cannot ping each other" does not make it clear if the VMs that can't ping each other are on the same host or not.

My thinking is that somehow there is an old association of a MAC address to IP address in ACI.

Given that this is a test environment, can you try issuing the following command on the APIC?

apic1# clear endpoints leaf <leaf_id> tenant <tenant_Name> vrf <VRF_Name>

There is a similar command for clearing endpoints from the BD too.

Mind you, if this is really the problem, it may have magically cleared itself by the time you read this.

And one more tip - to remove LLDP from the equation, when you associate the EPG with the VMM, make sure you choose pre-provision as the Resolution Immediacy

What I'd do next if you still have a problem is follow the access policy chain - ie. check the Leaf Profile is connected to the correct Interface Profile, and the Interface Profile has the right Interface Selector and the Interface Selector is associated with the correct Interface Policy Group and that the Interface Policy Group is connected to the correct AAEP and the AAEP is connected to the right VMM Domain and that the VMM Domain is connected to the right VLAN Pool and the VLAN Pool has the correct VLANs - all this can be done by following links under Fabric > Access Policies. I must write a full explanation one day!

BTW @bl80 - the link to BRKACI-2645 in your original post has a stray underscore character at the end, (https://www.ciscolive.com/ ...<snip>.../BRKACI-2645.pdf_) so the link needs editing.

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

bl80 · ‎07-25-2021

Thanks again for replying ---

I can confirm that the VMs are on the same host in VCenter. Sorry if that was not clear. Same Datacenter/Same cluster/Same host/Same Distributed Network Port/Same subnet defined. Cannot ping each other, cannot ping the Gateway of the subnet.

We have 2 other port groups (different vrfs/tenants for those BDs/APs/uEPGs) and a few other servers on each of those (most on different hosts) and they are behaving the exact same. We can see them appear when initially brought online in vcenter in Virtual Networking > Vmm Domains > Vmware > VDS >DVS> portgroups > correct port group.

Looking in Operations > EP tracker I cannot find the end point by IP, but I can see it via its mac address.

I performed the clear endpoints leaf command and no change.

We use uEPG exclusively throughout our fabric. That does not support pre-provision from what I understand. I get a warning dialog when even attempting to set it. I am not clear on the exact steps to test out pre-provision with the VMM integration. Our deployment is fully dependent on the automatic provisioning of VMs to fall into appropriate uEPGs. If there is a way to do some testing with pre-provision please let me know the process and I can try this, assume that it would be forgoing the uEPG and setting up the static endpoint under the App EPG? This is something I have never done and not clear on exact steps/settings.

I can confirm that all elements of physical interface -> PC -> profile -> AEP -> VMM Domain -> etc, etc is correct. Thankfully, we have a fully working production environment with identical settings and configurations. Definitely odd what we are seeing here.

Robert Burns · ‎07-26-2021

Ok, the plot thickens. uSeg makes a huge difference here as to the expected outcomes. Let's dig into this more.

On the uSeg EPG that VMs are showing up under as learned, is Intra-EPG isolation enabled?
Are both VMs you're testing between showing up under the same uSeg EPG (Operational > Client Endpoint)?

With uSeg, only the Base (regular) EPG is presented to vCenter as a port group, but on the Leaf, the traffic will be re-classified into the uSeg EPG based on the matching criteria (Network or VM attribute). This can look confusing as you might expect that VMs in the same port group to be able to communicate. This is the purpose of PVLANs w/ uSeg. All traffic is punted to the Leaf if/when Intra-EPG isolation is enabled so it can be enforced. There's no local vDS Port Group switching in this case.

The remaining strange issue is that it does not account for either VMs ability of not reaching the GW on the fabric. This will always be implicitly permitted.

You might also (as a test) want to test with regular EPGs (remove the uSeg from the picture) and see if that works. If it does, then you've isolated your issue somewhat. If not, then you likely have some strange host-level issue you need to investigate. Part of debugging is working backwards until you return to a working-state, then you can re-introduce config (uSeg) and determine where it's mis-configured.

One other comment caught my attention. You said "I deleted the VMM association from the App Profile, deleted the Port Group in VCenter and then re-added it all back in". << I hope you're not manually deleting port groups from the APIC-managed vDS from vCenter directly. These will auto-clean up & remove if/when you remove the VMM domain binding from the EPG assuming there are not VM interfaces assigned to the port group. **vCenter can't delete a port group if it's assigned to a VM.

Robert

bl80 · ‎07-26-2021

Robert - thank you for the great info. This issue has been resolved.

This was improperly mapped physical adapters in the Virtual Switch configuration in vSphere. The vmnics assigned were incorrect.

Unfortunately, the setup for that all falls completely out of my realm of knowledge and support ablitity. Learning new stuff all the time I guess.

Fix was to identify the correct MAC addresses noted in the physical adapters list under the host>networking>physical adapters and then map them properly under hosts>networking> virtual switches > migrate networking.

Full operational as intended.

Thanks for the information and suggestions along the way. Truly appreciated.

Robert Burns · ‎07-26-2021

Great to hear you resolved it! Lesson Learned - Always check (or have the server team check) your CDP/LLDP info when adding vmnics as uplinks - if they don't show your ACI Leaf switch, its the wrong one

Cheers,

Robert