05-07-2018 10:00 AM - edited 03-01-2019 01:32 PM
I'm in the process of building out rather a lot of UCS B200 M3 chassis (I'm on chassis 10 of 40), and just encountered an error I haven't seen before and can't really find a doc that states what the error code is for:
"IERR:Sensor Failure Asserted;"
Details -
<computeHealthLedSensorAlarm
alarmDesc="Sensor Failure Asserted"
alarmSeverity="minor"
dn="sys/chassis-3/blade-4/health-led/sensor-alarm-153"
sensorId="153"
sensorName="IERR"
>
</computeHealthLedSensorAlarm>
It's reporting as "minor", but it's causing my server builds on these three blades to hang.
I'm reasonably certain it's not the chassis, as this chassis was running a set of M2 blades before I started this upgrade process. This is also the second set of M3 blades I've had in the slots with issues, so I'm reasonably certain it's not the blades themselves.
What does this error code reference? One possibility is memory, as that's the one component swapped over when I changed out the blades; but generally when there's an issue with memory it gets called out explicitly with which slot(s) are having problems rather than the error code above.
Any ideas?
05-07-2018 10:03 AM - edited 05-07-2018 03:44 PM
This code is indicative of a processor error. If you are seeing this on a single blade, it may require a motherboard replacement.
05-07-2018 10:05 AM
Which sensor does that error code track to?
And also, I replaced the physical blade with another one, and I'm getting the same error code occurring. It seems odd that I'd suddenly have 6 bad blades with the same error codes.
05-07-2018 02:50 PM
Please do send me what that alert maps to if you can, but I've traced the issue to problems with some of the installed memory.
Actually, if there's a reference guide somewhere that just maps all of the error codes to sources that would be great; I could just refer to that when/if I encounter any more errors.
05-07-2018 03:42 PM
IERR error is a processor error, sometimes indicative of a failed hardware component, like system board.
How to Recover from an IERR for Intel® Server Boards
What am I seeing?
An IERR is a Processor Internal Error.
Why am I seeing it?
This error is a signal that indicates a processor unrecoverable error or even a non-CPU event, such as a system bus interruption or a memory interruption, can start this signal.
How to fix it
On the Intel® Server Boards listed at the bottom of this page, you can confirm or discard a Processor IERR from the Basic Input/Output System (BIOS) Setup Utility under Advanced > Processor Configuration > CPU Retest.
The IERR Filtering Algorithm helps you determine if the IERR signal came from a false CPU internal error or from another hardware source. This filtering algorithm helps you prevent unnecessary processor replacements. At the same time, this algorithm helps you to isolate IERR events. If the IERR returns after the CPU Retest, the IERR signal most likely came from the CPU itself. If you have more than one processor installed, check the System Event Log (SEL) to find out which processor is generating the IERR.
In some cases, a system restart can also eliminate an IERR.
I've tried checking the SEL and doing a system restart with no success. What else can I try?
If the problem persists:
Please ensure you are installing memory that is supported and configured per B200-M3 Spec Sheet. Otherwise, unexpected behavior may occur. I would agree that if you are seeing this across multiple servers it is likely memory configuration related.
08-06-2021 05:28 AM - edited 08-06-2021 05:28 AM
Hello,
I am having the same issue, and replacing another server in the same chassis slot got same error.
Appreciate if you share how were you able to solve this issue.
Thank You,
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide