01-22-2020 01:31 AM
Hi folks, we keep seeing this error and the server crashes. I am being told by Cisco to update the i40e and megaraid drivers which I don't believe will help. Is this a genuine hardware error
01-22-2020 01:32 AM
2020 Jan 21 16:49:53 UTC
Informational
EQUIPMENT_INOPERABLE
[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Cleared
2020 Jan 21 16:45:09 UTC
Critical
EQUIPMENT_INOPERABLE
[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Please check the processor's status.
01-22-2020 04:17 AM
That particular error, especially if there were no preceding DIMM failure alerts, is frequently either a CPU or systemboard issue, although you can see that message when there are problems with OS interaction with the processor C-states power settings.
I would pull down the UCS diag ISO and let that run, and upload/provide the results of that to a TAC case.
DescriptionDiagnostics for the Unified Computing System (UCS) C-Series Servers
ucs-cxx-diag.6.0.4b.iso
Kirk...
01-22-2020 04:24 AM
Thank you for the reply.
I have ran the diagnostic tool previously and it does not seem to highlight an issue, however we see the server crash after a few days in use, its very odd. The same happens with rhel 7.4 or 7.6. TAC want us to update the drivers as they are not in the compatability matrix, which I am not sure of.
01-22-2020 04:33 AM
You might want to try booting to a LINUX on a stick ISO distribution, and let that sit for a few days to see if the problem goes away. If the problem repeats with a different OS (and likely different drivers) then you may want provide that information to TAC.
Alternatively, you may want to leave the DIAG iso/image booted up for a couple of days and trigger multiple rounds of tests.
Kirk...
01-23-2020 01:50 AM
Thanks Kirk, I will retry the diag util. The server crashed again this evening with the same error message, I did update the CIMC and BIOS which doesn't seem to have had an effect on the behavior.
I have tried booting the system using ucs-cxxx-drivers-linux.4.0.4d.iso as the boot media, however I do not see the virtual DVD prompt (this is to install different drivers as per TAC recommendation), is this the correct way to install updated drivers for the OS? RHEL 7.4 and 7.6 use different drivers for each device type also. I will investigate booting from an alternate OS ISO also.
Andy
01-28-2020 04:32 AM
Hi
The issue keeps happening and is more frequent now. I have tried to boot the system with a usb stick with LuBuntu19 on it, it locked up before it fully loaded. Support are still claiming I need to update the i40e and megaraid drivers
Critical
EQUIPMENT_INOPERABLE
[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Please check the processor's status.
2020 Jan 28 12:12:19 UTC
Informational
EQUIPMENT_INOPERABLE
[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] P_CATERR: A catastrophic fault has occurred on one of the processors: Cleared
2020 Jan 28 12:12:18 UTC
Critical
EQUIPMENT_INOPERABLE
[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] P_CATERR: A catastrophic fault has occurred on one of the processors: Please check the processors' status.
2020 Jan 28 11:47:46 UTC
Informational
EQUIPMENT_INOPERABLE
[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Cleared
2020 Jan 28 11:35:37 UTC
Critical
EQUIPMENT_INOPERABLE
[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Please check the processor's status.
2020 Jan 28 11:35:36 UTC
Informational
EQUIPMENT_INOPERABLE
[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] P_CATERR: A catastrophic fault has occurred on one of the processors: Cleared
2020 Jan 28 11:35:35 UTC
Critical
EQUIPMENT_INOPERABLE
[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] P_CATERR: A catastrophic fault has occurred on one of the processors: Please check the processors' status.
01-28-2020 08:44 AM
PM me your TAC SR case #
Thanks,
Kirk...
04-23-2020 05:24 AM
Kirk/Andrew - was this ever resolved? I am seeing this error occasionally on a C240 and hoping it's something to easy to resolve. Any response/help is appreciated.
Thanks!
04-23-2020 05:33 AM
Hi there
It was eventually, it was nothing to do with software drivers. Cisco sent a new chassis with motherboard already installed and CPUs. The drives and other parts were swapped over and its now fine. Have you logged a TAC case?
04-23-2020 05:38 AM
Thank you for the quick response. No, I haven't opened a case yet. We had the failure this morning and I was able to power off and power on the server using the CIMC. Everything looks healthy at this time. Thankfully this is a DR server and did not cause any production impact.
04-23-2020 05:47 AM - edited 04-23-2020 05:49 AM
I would log it asap. Chances are itll happen again & it appears to be a hardware issue in our case. The server was brand new also.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: