cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5322
Views
0
Helpful
11
Replies

C240 M5 F0174 hardware error

andrew.maudsley
Level 1
Level 1

Hi folks, we keep seeing this error and the server crashes.  I am being told by Cisco to update the i40e and megaraid drivers which I don't believe will help.  Is this a genuine hardware error

11 Replies 11

andrew.maudsley
Level 1
Level 1

2020 Jan 21 16:49:53 UTC
Informational
EQUIPMENT_INOPERABLE
[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Cleared

2020 Jan 21 16:45:09 UTC
Critical
EQUIPMENT_INOPERABLE
[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Please check the processor's status.

Kirk J
Cisco Employee
Cisco Employee

That particular error, especially if there were no preceding DIMM failure alerts, is frequently either a CPU or systemboard issue, although you can see that message when there are problems with OS interaction with the processor C-states power settings.

I would pull down the UCS diag ISO and let that run, and upload/provide the results of that to a TAC case.

https://www.cisco.com/c/en/us/support/servers-unified-computing/ucs-c240-m5-rack-server-software/model.html#~tab-downloads

DescriptionDiagnostics for the Unified Computing System (UCS) C-Series Servers

ucs-cxx-diag.6.0.4b.iso

 

Kirk...

Thank you for the reply.

I have ran the diagnostic tool previously and it does not seem to highlight an issue, however we see the server crash after a few days in use, its very odd.  The same happens with rhel 7.4 or 7.6.  TAC want us to update the drivers as they are not in the compatability matrix, which I am not sure of.

You might want to try booting to a LINUX on a stick ISO distribution, and let that sit for a few days to see if the problem goes away.  If the problem repeats with a different OS (and likely different drivers) then you may want provide that information to TAC.

Alternatively, you may want to leave the DIAG iso/image booted up for a couple of days and trigger multiple rounds of tests.

 

Kirk...

Thanks Kirk, I will retry the diag util.  The server crashed again this evening with the same error message, I did update the CIMC and BIOS which doesn't seem to have had an effect on the behavior.

 

I have tried booting the system using ucs-cxxx-drivers-linux.4.0.4d.iso as the boot media, however I do not see the virtual DVD prompt (this is to install different drivers as per TAC recommendation), is this the correct way to install updated drivers for the OS?  RHEL 7.4 and 7.6 use different drivers for each device type also.  I will investigate booting from an alternate OS ISO also.

 

Andy

Hi

The issue keeps happening and is more frequent now.  I have tried to boot the system with a usb stick with LuBuntu19 on it, it locked up before it fully loaded.  Support are still claiming I need to update the i40e and megaraid drivers

 

 Critical

EQUIPMENT_INOPERABLE

[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Please check the processor's status.

2020 Jan 28 12:12:19 UTC

 Informational

EQUIPMENT_INOPERABLE

[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] P_CATERR: A catastrophic fault has occurred on one of the processors: Cleared


2020 Jan 28 12:12:18 UTC

 Critical

EQUIPMENT_INOPERABLE

[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] P_CATERR: A catastrophic fault has occurred on one of the processors: Please check the processors' status.

2020 Jan 28 11:47:46 UTC

 Informational

EQUIPMENT_INOPERABLE

[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Cleared

2020 Jan 28 11:35:37 UTC

 Critical

EQUIPMENT_INOPERABLE

[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Please check the processor's status.

2020 Jan 28 11:35:36 UTC

 Informational

EQUIPMENT_INOPERABLE

[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] P_CATERR: A catastrophic fault has occurred on one of the processors: Cleared


2020 Jan 28 11:35:35 UTC

 Critical

EQUIPMENT_INOPERABLE

[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] P_CATERR: A catastrophic fault has occurred on one of the processors: Please check the processors' status.

 

PM me your TAC SR case #

 

Thanks,

Kirk...

Kirk/Andrew - was this ever resolved? I am seeing this error occasionally on a C240 and hoping it's something to easy to resolve. Any response/help is appreciated.

 

Thanks!

Hi there

 

It was eventually, it was nothing to do with software drivers.  Cisco sent a new chassis with motherboard already installed and CPUs.  The drives and other parts were swapped over and its now fine.  Have you logged a TAC case?

Thank you for the quick response. No, I haven't opened a case yet. We had the failure this morning and I was able to power off and power on the server using the CIMC. Everything looks healthy at this time. Thankfully this is a DR server and did not cause any production impact.

I would log it asap.  Chances are itll happen again & it appears to be a hardware issue in our case.  The server was brand new also.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: