cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
14462
Views
5
Helpful
15
Replies

cisco ucs 5108 thermal problem

Sergey Sakharov
Level 1
Level 1

si

Hi everyone!

I've installed my first UCS system: 2 UCS 5108 & 2 UCS 6248

ee2a6d264433.png

In first chassys  six blade-servers (2 - b230 m2& 4 b200 m2). In second - 5 b200 m2. I've got two air conditioners in server room working on their maximum. For the last week i've received three faults on first chassis (Fault Code: F0411). IOM temperature was about 45-46. After that i've mooved 3 blade-servers to second chassis until i solve this problem.

UCSM version - 2.0.2r

Everything is quite good, except thermal problem. All blade-servers discovered, 0 errors and critical.

15 Replies 15

padramas
Cisco Employee
Cisco Employee

Hello Sergey,

Please a open TAC service request with UCSM and Chassis 1 and 2 tech support bundle.

http://www.cisco.com/en/US/docs/unified_computing/ucs/ts/Frame-Files-Converted-to-DITA--Do-Not-Use/TS_GeneralTroubleshooting.html#wp1073749

We need more logs to investigate the thermal fault.

Padma

Please reset the IOM physicaly present in that chasiss. I have done this twice for the thermal issue and the issue never re-accured.

Ram

I'll try to reset them/ Could you tell me how to do this correctly?

Just unplug the right hand side IOM and fix. Wait for 20 Mins and evrything comes up, repeat the steps for another side.

Ram

Do not turn power off? Just unplug and set back one IOM and then another?

Yes, don't power off, it is not required

I can't open TAC at this moment - my smartnet is still on registration... i've created technical files for Chassis 1 and 2. Should i place them here or wait for my smartnet?

David Alpizar
Cisco Employee
Cisco Employee

This can be caused by an I2c issue on the server.

You can try the following:

Reset fans one by one.

Reset PSU one by one

Finally, reset IOMs starting for the faulty one.

Also, determine which blade is showing any alarms and try to reseat the blade on the chassis.

Please make sure to wait a couple of minutes during the resetting of the components.

How to do it correctly? Power off than reset or what?

kg6itcraig
Level 1
Level 1

Think this is a code bug and you need to goto 2.0(q). Running two 6248 systems at that level and not having the issue. This thermal stuff plagued ALL the 1.4x releases.

Craig

My UCS Blog http://realworlducs.com

i've got the same errors on 2.0.2q...

Sergey,

If this is a real I2C issue, you may still see the same behavior on 2.0.x release if the I2C bus was not cleared before the upgrade. (in this moment I don't know if you recently performed an upgrade on the system or not)

I2C bus tranports information about the different components of the Unified System, this, meaning Chassis, IOMs, Fans, PSU, etc...  What happens is that all those components try to send theit status update while other do the same and then the I2C bus gets overwhelmed, and then noone can really report their real status, so we usually recommend the customer ro reseat all major components, one at the time, to clear the bus and then do the upgrade, if that is not done before the upgrade, it still should be done after.

Try reseating the Fans and PSU, one at the time, leaving a minute in between and then, IOMs one at the time, leaving three minutes in between and begining with the subordinate to cause minimun disruption.

If this does not clear the situation, then you will need to remove one of the components already mentioned, one at the time and do a "show tech-support chassis # all brief" to see what the I2C bus reports segment by segment (chassis, Fans, PSUs...) once you remove a component and the errors on each segment stop incrementing you will have your faulty piece of hardware, and a TAC case will be needed to send a replacement.

For further analysis or assistance, I strongly recommend a TAC case to be opened.

-Kenny

Have gone through powering off the whole 6248 UCS on 2q, and issues remain.

Craig

My UCS Blog http://realworlducs.com

Actually you don't have to power off the 6248 FIs. A effective but luxury solution, is to decommission and powercycle the chassis that is generating those faults, including the power cords removal.Then, you can wait a minute and recommision the chassis. After that, all thermal fauls should go away.

Review Cisco Networking products for a $25 gift card