Problem with UCS C210 M2 or ESXi?

James Hawkins · ‎12-07-2012

Hi,

I deal mainly with voice stuff and have limited UCS knowledge and am looking for some help troubleshooting issues with a

UCS C210 M2 that I have deployed to host some Cisco CallManager and Unity Connection servers.

Twice over the past couple of weeks all the virtual machines have shut down.

I need to work out why this has happened. The first thing I want to check is the uptime of the

UCS C210 M2 server itself before moving on to look at VMware.

I cannot see how to do this from either CIMC or by SSH connection to the server.

CIMC is currently running Firmware Version 1.4(3k)

Also if anyone can tell me how to check the uptime of the ESXi host I would be grateful.

Thanks

James

padramas · ‎12-07-2012

Hello James,

CIMC will be up and running as long as PSU cables are plugged to a power source. CIMC uptime is not available via CLI. You can gather CIMC tech support and it will have the information under tmp/_techsupport.txt file.

For ESXi, you can view the uptime using vSphere client or command " uptime " ( execute the command via SSH session or local console )

What exactly happens to your UC VMs ? Are they using local hard disk for storage ?

Padma

James Hawkins · ‎12-07-2012

Hi Padma,

Thank you for your response. I cannot find the uptime in vSphere (I only have access to the free version downloaded from EXSi) but have managed to get the uptime using the CLI - output shown below.

~ # uptime

13:59:29 up 04:18:36, load average: 0.05, 0.06, 0.06

~ #

This matches the time when the UC VMs went down.

Re. what happens is the users just notice phones unregistering etc. By the time the local support notice this and access vSphere to check on their status they just see them in the shutdown state and manually restart them. I guess I could set the VMs to automatically boot when ESXi itself boots but I really need to isolate what is causing the problem.

The servers are using local hard disks within the UCS server for storage.

Thanks

James

shaligowski · ‎01-07-2013

James,

Dont know if you found an answer but I had a similar issue and it had to to with itterupts. DOnt know if it will help but it was close enough I thought I would post it as I found your question before I opened a TAC case.

This is from TAC on my case.

It looks like you are running into an issue with interrupt remapping which is known to cause the server to hang. The solution is to upgrade the C-series firmware to version 1.4 or greater and disable interrupt remapping in ESXi & the BIOS.

C-Series 1.4 Release Notes

•When interrupt remapping is enabled in the BIOS, the virtual Host Bus Adapters (vHBAs) and the other PCI devices respond in ESX/ESXi 4.1. (CSCth36989)

•The interrupt remapping issue on VMware ESX is now fixed. (CSCty98534)

http://www.cisco.com/en/US/docs/unified_computing/ucs/release/notes/OL_24086.html

vHBAs and other PCI devices may stop responding in ESXi 5.x and ESXi/ESX 4.1 when using Interrupt Remapping

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1030265

C-Series Firmware Download

http://software.cisco.com/download/release.html?mdfid=283862069&flowid=25882&softwareid=283850974&release=1.4(3p)5&relind=AVAILABLE&rellifecycle=&reltype=latest

James Hawkins · ‎01-08-2013

Hi,

Thank you for the response. After looking through various logs we discovered that the UCS servers had lost power which caused the VMs to shut down.

The power issue was caused by the UPS supporting the servers which was doing a scheduled self test. This should not have cut the power but for some reason it did so.

Thanks for the information that you passed on. I will check the UCS firmware version to check whether the customers's server is effected.

Regards

James