04-29-2011 02:04 PM - edited 03-01-2019 09:54 AM
Hello All,
I have an issue with our ESX 4.0 environment on Cisco UCS blades with firmware version 1.1(1L). We are booting our blades from DAS and have our datastores connected to a NetApp FAS3140 via iSCSI. Distributed Virtual Switches are controlled by Nexus version 4.0(4)SV1(2).
The issue is with creating clones of VMs with larger VMDKs. Once we kick off a clone job, it will run for a while (30 minutes to 45 minutes) and then just hang. Eventually the host will disconnect itself from VSphere center due to the management services hanging. There arent any real ESX server logs generated that point to a problem. You can also look at the I/O graph (attached) for the ESX host and you can see the high I/O up until the point of the clone freeze and then "normal" I/O until the ESX host management services freeze. We even had one clone that currupted a datastore (thank god there was nothing else on it).
We have no problems cloning VMs that are 30 or 40GB but if you get above that we see the issue.
Any ideas would be great!
04-29-2011 05:34 PM
Hey Seth,
A few questions:
1. Blade adapter model, Firmware version & OS driver version being used?
2. Are you using a dedicated VLAN and/or NIC for iSCSI traffic?
3. Are you using iSCSI multipathing on the 1000v?
4. Are you using Jumbo Frames for iSCSI traffic?
5. How "large" are the VMDKs you're attempting to clone?
6. When the host "disconnects" from vCenter you should gather a "vm-support" bundle. There will be clues in the vmkernel logs as to what's happening during that time on the host with the vpxd & vpxa agents.
Let's start here then we'll dig deeper.
Regards,
Robert
FYI - UCSM 1.1 is quite old at this stage. The latest release is 1.4.2 which includes a great deal of feature & performance improvements. No need to jump to upgrading just yet, but just wanted you aware.
05-02-2011 02:21 PM
Hi Robert,
Here are the answers below. I apologize in advance as I am new to these Cisco Blades.
1. Blade adapter model, Firmware version & OS driver version being used? The Blade is a Cisco B200-M1 Firmware 1.1(1L).
2. Are you using a dedicated VLAN and/or NIC for iSCSI traffic? We are using a dedicated VLAN but not a dedicated NIC.
3. Are you using iSCSI multipathing on the 1000v? No.
4. Are you using Jumbo Frames for iSCSI traffic? No.
5. How "large" are the VMDKs you're attempting to clone? 100GB+
6. When the host "disconnects" from vCenter you should gather a "vm-support" bundle. There will be clues in the vmkernel logs as to what's happening during that time on the host with the vpxd & vpxa agents. There isnt anything in the logs...thats the weird thing.
05-02-2011 06:08 PM
After initiating a clone job, can you run "vm-support -s". This will gather performance stats. I'd like to see what they say about host activity & network utilisation during the task.
Also what VMware Driver version are you using? Just paste the output of "ethtool -i vmnicX" for one of your nics.
Regards,
Robert
05-03-2011 07:59 AM
Here is the NIC verson info...
driver: ixgbe
version: 1.3.36_NETQ-NAPI
firmware-version: N/A
bus-info: 0000:04:00.0
Due to the issues we are having with clones, we need to wait until a maintenance window to run one. Give me a couple days and Ill see what i can get you.
05-03-2011 05:21 PM
Thankst Seth. Also though I see you're using an Intel Driver, I can't deduce whether your blade adapter is the M71-KR-Q (Qlogic) or M71-KR-E (Emulex). Also confirm this.
Robert
05-04-2011 08:32 AM
82598KR-CI (Oplin) is what i have in my documentation.
05-06-2011 02:44 PM
Hi Robert,
We still havent had a chance to try a clone but I have a question. Would having Jumbo Frames NOT enabled cause cloning issues? My thought is that if transmitting smaller packets would make the ESX management services hang during a clone operation.
Thanks!
Seth
05-07-2011 02:34 AM
No. Though a Jumbo MTU would increase the performance & efficiency of cloning, having a default MTU will not cause an issue like this.
Robert
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide