cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements

4481
Views
0
Helpful
4
Replies
Highlighted
Beginner

Errors with UCS

I'm in the process of building out rather a lot of UCS B200 M3 chassis (I'm on chassis 10 of 40), and just encountered an error I haven't seen before and can't really find a doc that states what the error code is for:

 

"IERR:Sensor Failure Asserted;"

Details - 
<computeHealthLedSensorAlarm
alarmDesc="Sensor Failure Asserted"
alarmSeverity="minor"

dn="sys/chassis-3/blade-4/health-led/sensor-alarm-153"

sensorId="153"
sensorName="IERR"
>
</computeHealthLedSensorAlarm>

 

It's reporting as "minor", but it's causing my server builds on these three blades to hang.

I'm reasonably certain it's not the chassis, as this chassis was running a set of M2 blades before I started this upgrade process.  This is also the second set of M3 blades I've had in the slots with issues, so I'm reasonably certain it's not the blades themselves.

 

What does this error code reference?  One possibility is memory, as that's the one component swapped over when I changed out the blades; but generally when there's an issue with memory it gets called out explicitly with which slot(s) are having problems rather than the error code above.

 

Any ideas?

Everyone's tags (5)
4 REPLIES 4
Highlighted
Cisco Employee

Re: Errors with UCS

This code is indicative of a processor error. If you are seeing this on a single blade, it may require a motherboard replacement.

Highlighted
Beginner

Re: Errors with UCS

Which sensor does that error code track to?

 

And also, I replaced the physical blade with another one, and I'm getting the same error code occurring.  It seems odd that I'd suddenly have 6 bad blades with the same error codes.

Highlighted
Beginner

Re: Errors with UCS

Please do send me what that alert maps to if you can, but I've traced the issue to problems with some of the installed memory.

 

Actually, if there's a reference guide somewhere that just maps all of the error codes to sources that would be great; I could just refer to that when/if I encounter any more errors.

Highlighted
Cisco Employee

Re: Errors with UCS

IERR error is a processor error, sometimes indicative of a failed hardware component, like system board.

 

How to Recover from an IERR for Intel® Server Boards

 

What am I seeing?
An IERR is a Processor Internal Error.

Why am I seeing it?
This error is a signal that indicates a processor unrecoverable error or even a non-CPU event, such as a system bus interruption or a memory interruption, can start this signal.

How to fix it
On the Intel® Server Boards listed at the bottom of this page, you can confirm or discard a Processor IERR from the Basic Input/Output System (BIOS) Setup Utility under Advanced > Processor Configuration > CPU Retest.

The IERR Filtering Algorithm helps you determine if the IERR signal came from a false CPU internal error or from another hardware source. This filtering algorithm helps you prevent unnecessary processor replacements. At the same time, this algorithm helps you to isolate IERR events. If the IERR returns after the CPU Retest, the IERR signal most likely came from the CPU itself. If you have more than one processor installed, check the System Event Log (SEL) to find out which processor is generating the IERR.

In some cases, a system restart can also eliminate an IERR.

I've tried checking the SEL and doing a system restart with no success. What else can I try?
If the problem persists:

  1. Try to start the system with one processor at a time.
  2. Test another processor if possible.
  3. Remove and reinstall the memory.

 

Please ensure you are installing memory that is supported and configured per B200-M3 Spec Sheet. Otherwise, unexpected behavior may occur. I would agree that if you are seeing this across multiple servers it is likely memory configuration related.

CreatePlease to create content
Content for Community-Ad
Cisco Community May 2020 Spotlight Award Winners

Cisco COVID-19 Survey