03-12-2015 02:23 AM
Hello,
We had a Server Shutdown on UCS C220.
In the IMC Console we see this
[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] P_CATERR_N: A catastrophic fault has occurred on one of the processors: Please check the processors' status.
P_CATERR_N: Processor sensor, Predictive Failure asserted
I searched web for
P_CATERR_N, but could not find anything.
What is wrong ? Any idea ? Thanks a lot
Armin
03-12-2015 09:55 AM
Hi,
Well, this is not good sign. I have seen this ERROR leads to resolution as either firmware upgrade or RMA. Please open a TAC case for it.
- Ashok
******************************************************************************************************
Please rate the post or mark as correct answer as it will help others looking for similar information
******************************************************************************************************
03-12-2015 08:37 PM
P_CATERR-N means a Processor Catastrophic Error on your server... Sometimes this errors show up during server POST and then go away the next second; so the best advice is to open a TAC case and see if your crash matches the time CATERR error in the logs so we can tell you if that is the real cause of the reboot/shutdown.
-Kenny
11-07-2017 02:02 PM - edited 11-07-2017 02:04 PM
Anyone ever have success working around/through this? Running Cisco ESXi 6.5 on our UCSC-C240-M3S with BIOS Version: C240M3.3.0.3a.0 (Build Date: 03/15/17)
This is almost certainly a cause of attempting to passthrough a single PCI device -- nVidia Quadro 2000 (we have three equipped, just trying to get a single GPU). Thank you kindly for your attention to our little matter.
CIMC shows a EQUIPMENT_INOPERABLE Fault [0174][critical][equipment-inoperable][sys/rack-unit-1/board] P_CATERR_N: A catastrophic fault has occurred on one of the processors: Please check the processors' status... which is then immediately resolved upon power cycling the machine. Unfortunately, it takes everything down with it. The single node is brought to its knees.
I have attempted to refer to this documentation: https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/sw/fault/reference/guide/Cisco_UCS_C-Series_Servers_CIMC_Faults/CIMC_Faults.html
11-08-2017 09:48 PM
11-07-2018 05:12 AM
@peeat I'm facing the same issue. Could you solved your issue.
02-12-2019 02:12 AM - edited 02-12-2019 02:16 AM
No sir, @Ali Amir I was never able to successfully work through this. I was able to get a Windows 10 VM to brief support the same/similar (single Quadro 2000 -- my Cisco rack server has three of these installed in it, perfect for VDI deployment) in a custom HPE 6.7 ESXi build, but that only lasted until I rebooted the VM, now I'm getting the entire host hanging and other odd behaviors. If I wait a very long time, the machine finally comes up and I can access it remotely, but if you were to stand there locally and watch the screen, the progress bar gets stuck and never appears to finish loading.
Trying to update firmware on my C240-M3S to latest version 3.0(4j) and hopefully try again with CISCO Custom Image for ESXi 6.7 U1 GA. Will keep you posted if I make any progress, sir. Please do the same if you have found a resolution. Hate having three GPUs in this machine taking up space and wasting electricity with no ability to properly utilize them. I know they are working find, as I booted the machine into Windows and was able to apply drivers and test all three cards independently, so it's definitely an issue with either VMware's product, or more likely, my configuration.
EDIT: I can tell you that simply rebooting the host resolves the "catastrophic" failure, but boy that certainly scared me the first time I saw it come up and wasn't sure what I had done, if I had truly messed up. Thankfully it was just related to the PCI Passthrough. Will be looking to invest some time in the coming days/weeks to hopefully give it another go and see if we can work around the issue.
There was this thread that I attempted to refer to, it seemed to have significantly more information than anyone around here was able to provide: https://forums.servethehome.com/index.php?threads/troubleshooting-gpu-passthrough-esxi-6-5.12631/
05-15-2019 03:50 AM
This appeared on one of our servers, however we do not utilize GPU cards within the systems, we are utilizing the latest firmware for the systems (ucs-c240m5-huu-4.0.2f) and ESXi6.5U2 Custom ISO for Cisco (VMware-ESXi-6.5.0-9298722-Custom-Cisco-6.5.2.2). Before opening a TAC case I wanted to see if anyone came to a resolution.
09-15-2019 03:49 PM
@brian-henry my apologies I missed your reply. Sadly, I wasn't ever able to resolve the issue, despite being able to replicate it on demand -- i got sick of crashing the entire host, so I just gave up and stopped messing with them and the machine has continued to operate wonderfully (as long as I'm not tinkering with PCI Passthrough). I've heard others talk about temperature sensors going haywire and it causing random issues, but I'm hardly an expert on such things.
Hopefully in the months since you've been able to resolve your issue(s).
07-21-2022 11:19 PM
I am facing the same issue on HX 220C M5SX Firmware Version: 4.1(1d)
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide