experience / trouble with Nexus1000v deployment

Lukas Mazur · ‎01-20-2010

hi all,

i want to share by experiences / troubles i had during a N1KV deployment last week.

initial setup:

* 9 ESX Servers in 3 Clusters (2x4 and 1x1)

* virtual vCenter Server

* iSCSI Storage without Multipathing

* SC, VMKernel on vSwtiches

* VMTraffic on VMware vDS

* 6 NICs per Host

* SC / VMKernel on a vSwitch with 2 NICs

* iSCSI on a vSwtich with 2 NICs

* VMtraffic on a vDS with 2 NICs

This looks like a more or less standard VMware deployment.

1. Installation of the VSMs:

1. Try using the GUI Installer: the initial deployment of the VM with .ova / .ovf without any problems. After launching the Java Setup tool from the VSM and filling in some fields the Installer could not find the Nexus VSM VM and displayed the message: This VM is not a VSM (or something similar). At this point we started over again using the Console Installer and found out that the password chosen (8 sings, Upper Case letter, lower case letter, and number) was in a "dictionary" and therefore the GUI Installer accepted it in phase 1 but could not login in the second phase.

2. Installation of the VEM on host running vCenter.

This simply didnt work - after moving the vCenter to another Host there was no problem.

3. Troubles installing VEM Module.

During the Migration 4 Hosts have been moved to N1KV. with 2 of them we had big issues leading to reinstalling ESX Host itself. There were certain issues in the connectivity to the SC and iSCSI. After some troubleshooting we found out that the SC and VMKernel (iSCSI) were bound to both - the Nexus1000v and the vSwitch. That was mainly the reason for the reinstallation of the ESX Hosts. Have some of you experienced similar troubles in this deployment step? A further fact is that 2 of the 4 Hosts worked as they should and 2 had the described issues. After reinstalltion everything worked as it should.

4. vPC-HM with CDP

vPC Configuration with mode cdp worked without any issues (connecting to Catalysts).

5. Migration

after the configuration of all necessary port-profiles (Uplink & VNMware) we deployed VEM on one host and in this step we migrated one NIC from each vSwitch / vDS to Nexus and assigned the corrent uplink port profile. Then we started migrating SC, VMotion and iSCSI to Nexus. A critical point was the iSCSI VMKernel Interface. You have to migrate all VMs away from the Host. After verification of each function (SC, Vmotion, iSCSI ) we moved the second NICs to nexus.

Are there any suggestions/guides on moving vCenter to a Nexus-only Host? Because when vCenter is shutdown you cannot assign a Port-group on a vDS / Nexus to it.

i hope that i could give you guys an overview.

BR Lukas

nenduri · ‎01-20-2010

Hi BR Lukas,

I would recommend the following changes to vPC-HM configuration.

1. Instead of 'sub-group cdp' use 'mac-pinning' as below

channel-group auto mode on mac-pinning

2. Remove the channel configuration on upstream switch ports that are connected to VEM.

With 'sub-group cdp', the ports will not be in forwarding state when cdp information is unavailable. That's why the above recommendation.

thanks,

Naren

Lukas Mazur · ‎01-25-2010

hi Naren,

Thank you for the recomendation.

with manual assignment of subgroups to the interfaces i also wont be dependent on CDP.

with subgroup cdp i would just have problems when a switch reboots or during first channel setup (or during a situation when CDP is not send by a switch before channel setup).

switching from subgroup cdp to mac pinning is a non-disruptive process isn't it?

have you ever experienced the VEM Install failures on ESX hosts I've mentioned above?

thx Lukas

Robert Burns · ‎01-25-2010

Lukas,

Thanks for sharing your experience in detail. The more customer communicate to us, the better we can assist & improve our products.

In addition to what Naren pointed out I had a few comments/suggestions for you.

On point 2. There is a known limitation that VMware VUM can't install the VEM component on the host it's running on. This is a VMware VUM limitation with vCenter, not 1000v and is supposed to be addressed in a future release of VUM. If you had troubles with a manual install I would be curious to know the error message you were receiving.

To point 3. I'd like to know your method of migration - As some hosts migrated successfully and others did not. If you have futher details on your procedure we might be able to pinpoint any issues in the process. You might be already aware, but ensure your IP Storage VLAN is defined as a "System VLAN" on BOTH the VM AND uplink Port Profiles. If you're missing it in either place the VLAN will not come up following a reboot until the VSM -> VEM communication link has been established. This can result in issues when VMs are booting from shared storage.

Point 4. Using CDP sub-grouping you will get better bandwidth utilization as you're channelling multiple uplinks together. Although with mac pinning its a sure-fire way simple way to configure uplinks without any dependencies on the upstream CPD flows or switch configuration.

Point 5. Migration - you're correct here. Two things to know when migrating VMKernel ports to/from the 1000v. First is to ensure you disable DRS/HA on the cluster whenever modifying VMKernel ports - to prevent vCenter moving VMs to the host you're migrating. The other as you found out is to move VMs that are IP storage dependent to another host during this migration step. Just as any switch, if you were unplugging and re-plugging their uplinks there will be network disruption briefly, and IP storage may not be able to tolerate that.

As for migrating vCenter I can make this suggestion. Create your 1000v VM Port profile specifically for your vCenter or at least as a "Management" Port Profile. This will allow you to again define your vCenter VLAN as a "system vlan". Ensure you also have your vCenter VLAN defined as a system vlan on your uplink port profile. As for migrating it, after you've done the previous steps, just edit your vCenter VM properties and change the Network Adatper binding and point it to your new vCenter Port Profile on the 1000v. There no need to shutdown vCenter during this process. You should not notice any interuption at this time. As stated earlier, just ensure you've disabled DRS/HA to prevent any un-controlled VM movement.

Regards,

Robert

Lukas Mazur · ‎01-27-2010

hi Robert,

ad point 2: that you for that information. i haven't known about that limitation. all VEMs we installed were installed using VUM so I cant say anything about manual installation at this point.

ad point 3: the exact migration method was:

prerequisites:

* all VMs were moved away from that host

* HA / DRS was disabled on that cluster

* Mgmt / vMotion Vlans / portgroups were changed from untagged (native vlan) to tagged vlans and the uplink was changed to trunk (to support control and packet vlan for VSM)

* the uplink port profiles for Service Console, control / packet vlan, iSCSI, vMotion were configured as system vlans and configured on the uplink switches

* the port profiles for Service Console, control/packet vlan, iSCSI and vMotion were also configured as system vlans.

migraton steps:

* VEM installation using VUM: the hosts had a different minor patch level (all at Update 1 but not all further patches installed on all hosts). the VEM install itself ran without any issues.In the first step on of each hardware NICs was moved from vSwitch / vDS to Nexus. No functions were moved at this point.

* moving of Service Console "function" to Nexus1000v and verfification of access

* moving vMotion to Nexus

* moving iSCSI to Nexus

At this point we had troubles on 2 hosts. The result was that the for example the service console "function" was bound to Nexus1000v AND to vSwitch. This resulted in poor performance (SC) weird iSCSI behaivior, a.s.o. . Migrating function back to vSwitch using vSphere Client didnt help out so we decided to reinstall the ESX host itself. The VEM-VSM communication worked without any problems and we also experienced no problems regarding the vCenter-VSM connection. The VEM was seen as a well-functioning module using sh module, a.s.o. After reinstalltion everything worked as it should without any problems. . My question at this point: could it be that VUM installed a wrong version(for that host) of the VEM module and this lead to the stated behavior? Would you prefer manual installation instead of VUM assisted?

* on working hosts the second NICs were moved to Nexus.

ad point 4: thanks for this tip

ad point 5: that was also the way i was thinking of - thank you for "verification"

Thanks and BR

Lukas

Robert Burns · ‎01-27-2010

Lukas,

As for your question on VUM vs. manual VEM install. My personal preference is manual - assuming you're not talking about >20 hosts. VUM is a nice automated way, but its a rolling machine that you can't really control once it starts. Each host will be update sequentially, but there's not oppurtunity to migrate VMs around to prevent downtime. There are known issues with VUM and I think VMware is planning some much needed enhancements to this in the future. With the manual install you can safely migrate VMs to other hosts, upgrade one at a time and verify each upgrade after its complete. I'd rather know if there were any issues with my first host before it tried to continue on with others. I've also found if there are issues where VUM just can't install the VEM software. You sometimes need to completely remove the host from the DVS, jump on the ESX CLI and issue a "vem-remove -d" to completely remove the old agent, and then re-add your host to the DVS. This will push the full new VEM software, rather than running an upgrade script against an existing one. This does of course require that there are no Ports in use on that host (VMkernel, or vEthernet) so you'd need to migrate these temporarily back to a vSwitch if you attempt this procedure.

Anyways, thats my 2 cents.

Robert