cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
225
Views
0
Helpful
7
Replies
Highlighted

C240 M5 F0174 hardware error

Hi folks, we keep seeing this error and the server crashes.  I am being told by Cisco to update the i40e and megaraid drivers which I don't believe will help.  Is this a genuine hardware error

7 REPLIES 7

Re: C240 M5 F0174 hardware error

2020 Jan 21 16:49:53 UTC
Informational
EQUIPMENT_INOPERABLE
[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Cleared

2020 Jan 21 16:45:09 UTC
Critical
EQUIPMENT_INOPERABLE
[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Please check the processor's status.

Highlighted
Cisco Employee

Re: C240 M5 F0174 hardware error

That particular error, especially if there were no preceding DIMM failure alerts, is frequently either a CPU or systemboard issue, although you can see that message when there are problems with OS interaction with the processor C-states power settings.

I would pull down the UCS diag ISO and let that run, and upload/provide the results of that to a TAC case.

https://www.cisco.com/c/en/us/support/servers-unified-computing/ucs-c240-m5-rack-server-software/model.html#~tab-downloads

DescriptionDiagnostics for the Unified Computing System (UCS) C-Series Servers

ucs-cxx-diag.6.0.4b.iso

 

Kirk...

Highlighted

Re: C240 M5 F0174 hardware error

Thank you for the reply.

I have ran the diagnostic tool previously and it does not seem to highlight an issue, however we see the server crash after a few days in use, its very odd.  The same happens with rhel 7.4 or 7.6.  TAC want us to update the drivers as they are not in the compatability matrix, which I am not sure of.

Highlighted
Cisco Employee

Re: C240 M5 F0174 hardware error

You might want to try booting to a LINUX on a stick ISO distribution, and let that sit for a few days to see if the problem goes away.  If the problem repeats with a different OS (and likely different drivers) then you may want provide that information to TAC.

Alternatively, you may want to leave the DIAG iso/image booted up for a couple of days and trigger multiple rounds of tests.

 

Kirk...

Highlighted

Re: C240 M5 F0174 hardware error

Thanks Kirk, I will retry the diag util.  The server crashed again this evening with the same error message, I did update the CIMC and BIOS which doesn't seem to have had an effect on the behavior.

 

I have tried booting the system using ucs-cxxx-drivers-linux.4.0.4d.iso as the boot media, however I do not see the virtual DVD prompt (this is to install different drivers as per TAC recommendation), is this the correct way to install updated drivers for the OS?  RHEL 7.4 and 7.6 use different drivers for each device type also.  I will investigate booting from an alternate OS ISO also.

 

Andy

Highlighted

Re: C240 M5 F0174 hardware error

Hi

The issue keeps happening and is more frequent now.  I have tried to boot the system with a usb stick with LuBuntu19 on it, it locked up before it fully loaded.  Support are still claiming I need to update the i40e and megaraid drivers

 

 Critical

EQUIPMENT_INOPERABLE

[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Please check the processor's status.

2020 Jan 28 12:12:19 UTC

 Informational

EQUIPMENT_INOPERABLE

[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] P_CATERR: A catastrophic fault has occurred on one of the processors: Cleared


2020 Jan 28 12:12:18 UTC

 Critical

EQUIPMENT_INOPERABLE

[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] P_CATERR: A catastrophic fault has occurred on one of the processors: Please check the processors' status.

2020 Jan 28 11:47:46 UTC

 Informational

EQUIPMENT_INOPERABLE

[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Cleared

2020 Jan 28 11:35:37 UTC

 Critical

EQUIPMENT_INOPERABLE

[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Please check the processor's status.

2020 Jan 28 11:35:36 UTC

 Informational

EQUIPMENT_INOPERABLE

[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] P_CATERR: A catastrophic fault has occurred on one of the processors: Cleared


2020 Jan 28 11:35:35 UTC

 Critical

EQUIPMENT_INOPERABLE

[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] P_CATERR: A catastrophic fault has occurred on one of the processors: Please check the processors' status.

 
Highlighted
Cisco Employee

Re: C240 M5 F0174 hardware error

PM me your TAC SR case #

 

Thanks,

Kirk...

CreatePlease to create content
Content for Community-Ad
FusionCharts will render here