03-23-2017 07:52 AM - edited 03-01-2019 01:07 PM
Having an issues with this fault: F0411 Thermal condition on chassis 1 is upper-non-recoverable. Has anyone any advise on how to resolve this?
03-23-2017 07:57 AM
If you check your various temps on blades and IOMs, PSUs, and all looks normal, then you may be hitting an issue where the I2C bus used by IOMs, PSU, FANs, blade CMCs to transfer environmental info, is jammed up, by one of the devices (i.e. FAN).
Later firmware, alleviates most of the timing issues that can lead to these conditions, but it takes a reseat of the FANs and PSUs to get the new I2C timing type programming.
I would try a reseat of PSUs and FANs, one at a time, and see if the alerts clear.
If you have continued issues, then you'll need to open a TAC case to have logs reviewed.
Thanks,
Kirk...
03-23-2017 08:24 AM
Thank you, I will try a few of these.
07-28-2017 08:35 AM
Were you able to resolve this? I am getting the same error on a chassis with firmware 3.1(3a), 3.1(3b) and now 3.1(3c). I have Cisco SR 682748677 open and have gone through the reseating process of all the modules:
I2C workaround:
++ Perfumed the below I2 workarounds.
Reseat: take the module out wait 2 minutes then put it back
(for IOM wait 3 minutes for database to be updated)
1) Reseat each fan, one at a time
2) Reseat each power supply, one at a time
3) Remove IOM-1, wait until IOM-2 is fully operational, then reseat it
4) Watch the FSM as the IOM returns and wait for it to become fully operational again
5) Once IOM-1 is back online, and all the vif paths on the blades are up remove IOM-2 then reseat it
The error still persists, they are running the issue by the development unit as they do not see any issues on the back-end.
09-22-2017 11:51 AM - edited 09-22-2017 11:54 AM
We spent 4 months troubleshooting this with Cisco TAC and finally gave up and replaced it with another solution. Having false alarms go off every 5 min is no way to monitor a system. This I2C crap has been a bug for years with Cisco and they can't seem to fix it. It's getting old Cisco. We are running 3.1(2f) by the way. Yes, we have reseated all the equipment, yes we have opened 5+ TAC cases, no Cisco still has no fix for this. We like UCS but this is a major oversight.
12-05-2017 08:40 AM
I am having the same issue after upgrading from 3.2 1d, to 3.2 2c. I am working with TAC right now. I have re-seated everything but it came back. I will see what they come back with. I Agree this I2C bug always seems to pop up throughout the life of this product. I really think highly of the UCS platform but the frustration of these types of issues makes me always fear upgrades and is pushing me to think that the cloud model is not such a bad direction to go. I will reply if there is any fix for me.
02-05-2018 05:51 PM
We are running 3.1(2f) and have the same issue as well. We have attempted all the workaround ( ie - reseat the PSU , FAN and IOM ) but no luck in getting the issue resolved. Although , the alert is triggered intermittently once every week and not as often as someone else has reported.
Please post an update if anyone is able to resolve this.
02-06-2018 06:37 AM
02-06-2018 12:47 PM
02-06-2018 12:50 PM
02-06-2018 03:18 PM
Thank you. We are running the B200 M4 blades on the 5108. It looks like firmware update might be the way to go. Probably going to wait until CISCO releases the new firmware to address the Intel CPU vulnerability bug and target the new firmware for install.
02-07-2018 07:03 AM
02-07-2018 03:24 PM
No idea whether if the new firmware will fix the thermal issue. Just going by previous CISCO practice where new firmware release will attempt to fix a number of bugs. As you mentioned , it would be interesting to see how the new firmware will address the Intel vulnerability . Crossing fingers.
06-19-2018 08:58 AM
This is an old thread now but I wanted to throw out there that the I2C BUS is hardware that has had software issues with overly aggressive timers, etc in the past. However, just because you are having issues on the I2C BUS does not mean that it is a software issue every time. The PCA9541 sensors in the fans, PSUs, etc can go bad or hold a lock on the BUS. In some instances (shouldn't be the go to solution) a hardware replacement is needed for a fan/PSU/IOM to correct I2C BUS congestion and false alerts.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide