cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
16279
Views
16
Helpful
13
Replies

Having an issues with this fault: F0411 Thermal condition on chassis 1 is upper-non-recoverable.

Tom Lundy
Level 1
Level 1

Having an issues with this fault: F0411 Thermal condition on chassis 1 is upper-non-recoverable. Has anyone any advise on how to resolve this?

13 Replies 13

Kirk J
Cisco Employee
Cisco Employee

If you check your various temps on blades and IOMs, PSUs, and all looks normal, then you may be hitting an issue where the I2C bus used by IOMs, PSU, FANs, blade CMCs to transfer environmental info, is jammed up, by one of the devices (i.e. FAN).

Later firmware, alleviates most of the timing issues that can lead to these conditions, but it takes a reseat of the FANs and PSUs to get the new I2C timing type programming.

I would try a reseat of PSUs and FANs, one at a time, and see if the alerts clear.

If you have continued issues, then you'll need to open a TAC case to have logs reviewed.

Thanks,

Kirk...

Thank you, I will try a few of these.

Not applicable

Were you able to resolve this? I am getting the same error on a chassis with firmware 3.1(3a), 3.1(3b) and now 3.1(3c). I have Cisco SR 682748677 open and have gone through the reseating process of all the modules:

I2C workaround:
++ Perfumed the below I2 workarounds.
Reseat: take the module out wait 2 minutes then put it back
(for IOM wait 3 minutes for database to be updated)

1) Reseat each fan, one at a time
2) Reseat each power supply, one at a time
3) Remove IOM-1, wait until IOM-2 is fully operational, then reseat it
4) Watch the FSM as the IOM returns and wait for it to become fully operational again
5) Once IOM-1 is back online, and all the vif paths on the blades are up remove IOM-2 then reseat it

The error still persists, they are running the issue by the development unit as they do not see any issues on the back-end.

We spent 4 months troubleshooting this with Cisco TAC and finally gave up and replaced it with another solution.  Having false alarms go off every 5 min is no way to monitor a system.  This I2C crap has been a bug for years with Cisco and they can't seem to fix it.  It's getting old Cisco. We are running 3.1(2f) by the way. Yes, we have reseated all the equipment, yes we have opened 5+ TAC cases, no Cisco still has no fix for this.  We like UCS but this is a major oversight.

jettacone
Level 1
Level 1

I am having the same issue after upgrading from 3.2 1d, to 3.2 2c.  I am working with TAC right now. I have re-seated everything but it came back.  I will see what they come back with.  I Agree this I2C bug always seems to pop up throughout the life of this product.  I really think highly of the UCS platform but the frustration of these types of issues makes me always fear upgrades and is pushing me to think that the cloud model is not such a bad direction to go. I will reply if there is any fix for me.

We are running 3.1(2f) and have the same issue as well. We have attempted all the workaround ( ie - reseat the PSU , FAN and IOM ) but no luck in getting the issue resolved. Although , the alert is triggered intermittently once every week and not as often as someone else has reported. 

Please post an update if anyone is able to resolve this. 

We just know that this affects Gen 1 chassis so we are avoiding the Gen 1 chassis on the next lease. Our sales engineer said this is an issue with the circuitry of the chassis/boards.

Hi Sentillatiben



I had to upgrade to the latest BIOS on those particular blades b2030 M2 and the issue resolved after that.


Thanks for info, for us this is a chassis issue, Cisco offered to replace it but we opted for one outage when our lease comes due. It's a pain all around.

Thank you. We are running the B200 M4 blades on the 5108. It looks like firmware update might be the way to go. Probably going to wait until CISCO releases the new firmware to address the Intel CPU vulnerability bug and target the new firmware for install. 

So do you know if this firmware has an actual fix for this issue in it? We have tried firmware upgrades many times so just wondering if this has an actual fix for the issue. Yes, Feb 17th or so is the Intel firmware fix, should be fun.

No idea whether if the new firmware will fix the thermal issue. Just going by previous CISCO practice where new firmware release will attempt to fix a number of bugs. As you mentioned , it would be interesting to see how the new firmware will address the Intel vulnerability . Crossing fingers. 

This is an old thread now but I wanted to throw out there that the I2C BUS is hardware that has had software issues with overly aggressive timers, etc in the past. However, just because you are having issues on the I2C BUS does not mean that it is a software issue every time. The PCA9541 sensors in the fans, PSUs, etc can go bad or hold a lock on the BUS. In some instances (shouldn't be the go to solution) a hardware replacement is needed for a fan/PSU/IOM to correct I2C BUS congestion and false alerts.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: