cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3275
Views
0
Helpful
6
Replies

The importance of your management network in an ACI world

cooperb01
Level 1
Level 1

Hi

A few months back my Vcenter license in my lab expired. As expected, all the hosts showed offline but my VM's were still running. 

The APIC was configured with my VMM domain and I was using the "on-demand" option for deployment, therefore EPG's will only get created on the leafs where necessary based on what the APIC learns from Vcenter. When the license expired the APIC learnt that all the hosts were offline and removed all EPG's.

Now I know that it’s very unlikely that organisations will not run temp licenses for production services and I don't believe that you can buy a Vcenter license that will expire unless it’s an eval license so this is not really an issue but it’s a good example of how the different deployment options work within ACI.

This then started me thinking about what potential issues I could have if connectivity (OOB network issues) between APIC and VC is down. 

For instance, the APIC cluster is up and can see the fabric, Vcenter is working and can see all the hosts but the OOB network that connects VC to APIC has issues, ToR failure etc.. and connectivity between VC and APIC is lost. 

My question is this, what happens when DRS (vmotion) moves a VM to a host that is connected to a leaf that does not have the EPG configured? I would assume that because APIC cannot learn the move from VC, the EPG would not get created and therefore the VM would be disconnected from the network.

If this is the case it is vital that the OOB network is fully resellient and the APIC and VC have dual uplinks connecting to different ToR switches.

Thanks

Ben

 

 

 

 

 

1 Accepted Solution

Accepted Solutions

Hello! 

I finally have a report to give you. Finished testing in the lab. 

If OOB between the three APICs and vCenter go down, then a vMotion is initiated this is the result:

  • If the VMM domain is associated to the EPG as on-demand/on-demand then the endpoint will not be learnt nor will the interface be programmed with that EPGs/port groups VLANs. 
  • If the VMM domain is associated to the EPG as immediate/immediate then the endpoint will be learned and everything will be programmed as expected. 

There will be a bug filed on the first scenario, i am organizing the output from logs before i file the bug. 

Thank you for bringing this failure scenario to light!

What other questions do you have?

View solution in original post

6 Replies 6

dpita
Cisco Employee
Cisco Employee

Hello

Thats a good question! 

Under a normal migration, when a VM moves due to vMotion onto a leaf that does not have those EPGs and VLANs programmed, the will be deployed immediately. vCenter/ESXI host will send a GARP to ACI, the old leaf will bounce traffic to the new location of the endpoint and traffic/learning will occur. The bounce entry will stick around for a bit (about 5 minutes) and then be removed. The EPGs, VLANs, and Default Gateway will be deployed as soon as the move is detected and there will be little to no downtime (i usually see 0-1 ping loss, most of the time just increased latency) 

This is of course, under the assumption that the ESXI host is connected to the ACI fabric and access policies are pre-provisioned. 

Now, regarding how ACI will react if OOB between the APICs and the vCenter is down, I'm not sure what will happen. Definitely a good test to do in the lab, i will probably be testing this in my spare time! 

Thank you for posting! 

Thanks.

 

Did you manage to test this out?

Hello,

I have not had a chance to recreate this issue of migrating a VM while APICs are disconnected from vCenter through OOB. As soon as i am able to access my lab physically i will test this out and report back on the thread. 

Hello! 

I finally have a report to give you. Finished testing in the lab. 

If OOB between the three APICs and vCenter go down, then a vMotion is initiated this is the result:

  • If the VMM domain is associated to the EPG as on-demand/on-demand then the endpoint will not be learnt nor will the interface be programmed with that EPGs/port groups VLANs. 
  • If the VMM domain is associated to the EPG as immediate/immediate then the endpoint will be learned and everything will be programmed as expected. 

There will be a bug filed on the first scenario, i am organizing the output from logs before i file the bug. 

Thank you for bringing this failure scenario to light!

What other questions do you have?

Hi 

Thank you for testing this.

 

Can you confirm what the expected behavior is after the bug is fixed?

I understand that on-demand EPG's will be created if :

1) The EP sends traffic out its NIC and the attached Leaf will learn the MAC and IP.

2) VCenter can inform the APIC of the attached Hosts and the VM's deployed on the hosts.

Is there a priority on these learning types?

 

Ben

Hey Ben,

We spoke with Development and they came back saying that this is expected behavior.  With On-Demand mode, we require that the vCenter to APIC communication be up for us to be notified that a VM comes online on a host and for us to program vlans on the connected interfaces. 

 

We won't program when we receive packets, we need to have been notified as well via Out of Band.

 

If this scenario is something that is worrying in the environment, you will have to choose "immediate" as the deployment mode.

 

Fortunately, there is only 1 condition in which you will have this problem, and that is if the OOB switch infrastructure is having issues and APIC and vCenter are up trying to talk.  A problem that hopefully doesn't happen often.

 

Joey

Save 25% on Day-2 Operations Add-On License