cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4410
Views
15
Helpful
41
Replies

VMM - Either the EpG is not associated with a domain

dnoc43
Level 1
Level 1
We've setup a VMM domain in our ACI environment according to the attached diagram. We have 2 links going into leaf161/171 (Phy Dom AAEP) and we have 2 links going to leaf163/173 (VMM Dom AAEP). Leaf161/171 are static ports for vmotion, storage, iscsi, and mgmt. Leaf 163/173 are just for dynamic VLANs for VMM. After completing the VMM configuration everything works fine. However we're getting config-failed alerts when we move VMs to the VMM VDS. When I look at the EPG members I see that ACI is trying to map the dynamic VLANs to LEAF161/171 which is not part of the VMM. How is it learning these interfaces. Is there a way to not map the dynamic VLANs on those interfaces?vmm_vlan_issue.pngvmm_vlan_issue_02.pngInkedvmm_vlan_issue_03_LI.jpg
2 Accepted Solutions

Accepted Solutions

Robert Burns
Cisco Employee
Cisco Employee

UCSM side is programmed fine.  Disjoint L2 & pin groups are accurate.  The problem is from ACI side.  I don't know 100% why yet, but ACI will receive discovery info from all UCS FI interfaces, and this will tell the fabric that the FIs are a looseNode (lsnode).  I'm thinking that since VMM is tied to some of the interfaces already, its also trying to bind the VMM to the physdom Leafs as well - since the same  "looseNode" is also see on those interfaces.  I don't often see two separate AEPs & domains from the same FIs, so I can't confirm if this is the reason.  Usually for this, I'd setup a single AEP for all UCS interfaces, linked to two domains (one phys and one VMM).  The AEP is really representative of a single networking enviornment - in which UCSM is just that - one environment (its masquerading as a single host with mulitple MACs behind it).  I'd have to lab this up to confirm, but I don't have a UCSM + ACI envornment with multiple uplink sets to test this.  I'll keep digging around and see if this might explain things.

Robert 

View solution in original post

Hi @dnoc43 & @Robert Burns ,

Does this sound like an explanation?
 https://quickview.cloudapps.cisco.com/quickview/bug/CSCvt02685 (or
 https://bst.cisco.com/bugsearch/bug/CSCvt02685 for complete description)

Note the following:

  • Last Modified Feb 02, 2023
  • Known Fixed Releases (0)

Workaround:

1) Acked Faults | Hide Acked Faults | Hide Delegated Faults

2) Ignore Faults

3) Use testapi to delete the F0467 faults. In the lab, we see the faults coming back after the APIC reload.

4) Only enable Discovery protocol in the VMM Facing Ports towards the lsnode.

I'm not 100% sure and I'm too lazy to look back, but I had thought that you'd already done 4) above

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

View solution in original post

41 Replies 41

RedNectar
VIP
VIP

Hi @dnoc43 ,

Firstly - THANK YOU for the diagrams and the clear explanation of your problem. That's why I marked your Q as Helpful

My bet is that

  1. both your VMM Domain and your Phys Domain are linked to the same AAEP, and
  2. The VMM domain association on the EPG has the Resolution Immediacy marked as Pre-provision

Now changing either one of these should fix your problem, but let me suggest that the BETTER way is to create a separate AAEP for the management and vMotion functions - and keep the VMs isolated to their own AAEP and VLAN pool.

 

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

Thanks for the reply Chris.  I double checked AAEPs on physical domain and vmm domain. Both are different AAEPs.  I also checked the IPGs for both VPCs and they are associated with correct corresponding AAEP.  My resolution immediacy was pre-provisioned. I updated them all to on-demand.  That didn't seem to resolve the issue.

Hi @dnoc43 ,

OK. So go to the EPG, then to [Operational] > [Configured Access Policies] or, if you have the Policy Viewer plugin installed, that may be a better option. [Edit: 2023.04.08 - updated diagram so both diagrams match]

RedNectar_0-1680954460560.png

 

RedNectar_1-1680764685175.png

From one of these two views, you should be able to see the problem.  I hope!

 

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

Forgot to tag you on response.

dnoc43
Level 1
Level 1

Very odd! When I attach the phy dom to the EPG it shows every IPG I have created even though no static ports are assigned. Is that normal behavior? 

vmm_vlan_issue_04.png

 

I've collaped the phy domain AAEP to show full policy

vmm_vlan_issue_05.png

Hi @dnoc43 ,

OK. I've had a bit more of a look - tried to lab it but I can't hook up an ESXi in a VPC ATM, so just added VMs to the design I had before.  BUT...

Very odd! When I attach the phy dom to the EPG it shows every IPG I have created even though no static ports are assigned. Is that normal behavior? 

YES - this is normal behaviour IF you have mapped the AMDC-AAEP up to the TEST2_EPG for a particular VLAN (this effectively assigns EVERY IPG linked to the AAEP to the EPG)

On the other hand, linking two AAEPs to the same Phys Dom is NOT normal behaviour - at least NOT best practice, although it is technically possible to do so.

RedNectar_1-1680956065498.png

I reckon that if you can collapse your two AAEPs to a single AAEP, or if that's not possible, create a new PhysDom (with the same VLAN pool if necessary) and add that PhysDom association to the TEST2_EPG and then make sure there is one PhysDom per AAEP, then your conundrum from above may fix itself.

Let me know how it goes.

 

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

If I do a endpoint lookup on a known mac address of VM on VMM this is what I see.  It's still being learned from 162/172??  Since this is a test environment my next step was to disable the vmotion/storage/mgmt VPC ports on 162/172.  Then did the endpoint search again. It's still there??

vmm_vlan_issue_06.png

Hi @dnoc43 ,

I'm beginning to wonder if the problem MAY be with the bits of the diagram that you didn't include.  Based on your naming, I'm assuming your physical setup looks more like this in reality (no wonder you didn't put all the detail in!)

RedNectar_1-1681351192762.png

The thing is that your VPCs are NOT connected to ESXi hosts, but to FIs

Now, within the ESXi hosts, the VMM vSwitches will send traffic either to FI-A or FI-B (based on source virtual port which means the same MAC always goes to the same FI)

And UNLESS YOU ARE USING DISJOINT VLANS on the FIs, the FIs will load balance on all uplink ports, which in your case is the two VPCs! And that would explain perfectly why you are seeing traffic from a non-expected VLAN on one or both of the VPCs

So I think your next troubleshooting step would be to look at UCS Manager, and check to see how your uplinks are configured

In UCS Manager, thats Equipment > Fabric Interconnects > Fabric Interconnect A (or B) then click the LAN Uplinks Manager from the General tab.

 

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

Yes, sorry you're 100% correct. That's exactly how it's wired up.  Only difference is we have pinning setup in UCS to pin specific VLANs to port-channels.  My concern is why is the APIC still pointing to a VPC IPG that's shut down.

Inkedvmm_vlan_issue_06_LI.jpg

If I look at LEAF162 I see only a remote EP pointing to a tunnel interface destined for LEAF173

vmm_vlan_issue_07.png

 

Hi @dnoc43 ,

You say

we have pinning setup in UCS to pin specific VLANs to port-channels.

that is also known as Disjoint VLANs in UCS speak.  But given the behaviour that you are seeing, I'd be checking carefully the UCS configuration to see that the UCS pinning/disjoint VLANs are set up correctly

From the ACI Leaf side of the story, the way VPC switch pairs work is that any MAC address learned on one switch is immediately learned by the other, because the learning switch sends a message to its partner switch - but it should NOT appear as being at the end of a tunnel like your image above.

But back to the UCS Pinning!

Side-Thought  Have you checked that the MAC address is in fact unique?  Because if the same MAC existed on both ESXi1 and ESXi2, that could explain this behaviour.

Starting on one of the ESXi hosts (it doesn't matter which one - not withstanding my side-thought above), I'll assume the MAC address 00:50:56:a0:56:31 belongs to a VM - it certainly LOOKS like a VM MAC.

The VM sends a frame which will arrive at the VMM vSwitch on one of the ESXi hosts on a virtual port.

Side-Thought #2  Have you checked that the VMM vSwitch is configured for Route based on the originating virtual port

Assuming that load-balancing on the vSwitch has been configured for Route based on the originating virtual port, the vSwitch encapsulates it with an 802.1Q tag of 2518 and should ALWAYS send that frame towards the same FI. To continue the model, let's assume it is FI-A

FI-A gets the frame, and based on the VLAN grouping for the pinning of VLAN 2518, will always send it towards Leaf163 OR Leaf173.  This time, the load balancing will be based on LACP (Fabric Interconnects don't give you a choice)  - but that shouldn't matter.  Remember both Leaf163 AND Leaf173 share knowledge of that MAC address - or SHOULD share that knowledge. 

BUT THIS IS NOT WHAT YOU SEE

Somehow, MAC address 00:50:56:a0:56:31 has turned up on Leaf162 or Leaf172 (which were NOT in your original picture) so it seems there is yet another set of uplinks from the FIs to ACI

Again - this leads me to say - CHECK THE MAC PINNING ON THE FIs.  Somehow one of the FIs MUST have sent a packet up the Leaf162/172 VPC link with a source MAC of 00:50:56:a0:56:31 and an 802.1Q tag of 2518.

 

 

 

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

When I add a VMM domain to an EPG.  How are the EPG members dynamically learned? As soon as I add the VMM all my leaf switches are added to the dynamic EPG members. This might help point me in the correct direction. 

Hi @dnoc43 ,

When I add a VMM domain to an EPG.  How are the EPG members dynamically learned? As soon as I add the VMM all my leaf switches are added to the dynamic EPG members. This might help point me in the correct direction. 


RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

Chris thanks so much for the video. It did help with the learning source.  Which brings me to today's question.

I've noticed as soon as I attach a vm to the dvs I see this in the Client Endpoint in EPG.  Notice the 2 vmm incorrect paths and the correct learned path once it sees a packet.  The two incorrect paths are port-channels connected to the same FI.  I guess ACI is assuming these are also valid paths to the vmware hosts?  Is there a way to prevent this learning?  The two incorrect paths don't go into "learning" because it never receives a packet on those interfaces.  

vmm_vlan_issue_08.png

Below is the layout for one of the two FIs.  Both LEAF1s should not be there.

vmm_vlan_issue_09.png

Hi @dnoc43 ,

I have to say I'm getting a bit confused. Maybe it's just the naming you are using.  From your original posts there were the following VPCS pair of leaf switches

  • Leaf161 & Leaf171
  • (Leaf162 & Leaf172 - not specifically mentioned, but present all the same and visible in later pasts)
  • Leaf163 & Leaf173

Now, from FI-A's point of view there is 

  • AMDC-0116-LEAF1
  • AMDC-0116-LEAF3
  • AMDC-0117-LEAF1
  • AMDC-0117-LEAF3

I'm having a conceptual problem resolving the two naming systems.


Now. The bits that worries me most are:

  1. The fact that there are two AAEPs linked to the AMDC-PHY-DOM as shown in this post
    • My advice was to collapse your two AAEPs to a single AAEP, or if that's not possible, create a new PhysDom (with the same VLAN pool if necessary) and add that PhysDom association to the TEST2_EPG and then make sure there is one PhysDom per AAEP
    • I also believe that this may be the cause of the faults that appear under your EPG (for VLAN 2518 on nodes 161 and 171)
  2. Have you confirmed that the VLAN mappings on the Fabric Interconnects is correct?
    • Because you have evidence that packets on VLAN 2518 are arriving on Leaves 162 and 172, as shown here and here
RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

Review Cisco Networking for a $25 gift card

Save 25% on Day-2 Operations Add-On License