01-09-2017 02:42 AM - edited 03-01-2019 01:01 PM
I have a Cisco UCS Mini Blade enter with two B200-M4's and one C240-M4. All three Blades have memory installed in the first two DIM slots ie A1 and A2 but not A3....H1 and H2 but not H3 etc.
All three Blades have ECC memory errors logging in the SEL Logs.
The rather curious thing is all three blades report these, and for every DIMM, but ONLY for DIMM's in slot 2. There is not one ECC error logged at all in any of the three Blades for a DIMM in slot 1.
Like this:
41 | 01/09/2017 10:30:37 | CIMC | Memory DDR4_P1_A2_ECC #0xa4 | read 1 correctable ECC errors on CPU1 DIMM A2 | Asserted
42 | 01/09/2017 10:31:01 | CIMC | Memory DDR4_P1_A2_ECC #0xa4 | read 2 correctable ECC errors on CPU1 DIMM A2 | Asserted
43 | 01/09/2017 10:31:12 | CIMC | Memory DDR4_P1_A2_ECC #0xa4 | read 2 correctable ECC errors on CPU1 DIMM A2 | Asserted
44 | 01/09/2017 10:32:21 | CIMC | Memory DDR4_P1_B2_ECC #0xa7 | read 1 correctable ECC errors on CPU1 DIMM B2 | Asserted
I'm getting these errors logged every few minutes.
I can't imagine ALL DIMM's in Slot 2 of banks A through H on ALL three Blades need replacing.
Has anyone seen this before? Any recommendation before I log a support call?
Firmware on our UCS is 3.1(2c).
Solved! Go to Solution.
01-10-2017 11:44 AM
Your engineer is correct, this is normal and not cause for concern as they are Correctable ECC errors.
If you were seeing Uncorrectable ECC errors, that's cause for concern, however based off the log file excerpt you've provided in the initial description, that's not the case.
-AJ Felts
Cisco TAC
01-09-2017 10:13 AM
was this memory preinstalled ? or did you do it ?
p15 of the specs
http://www.cisco.com/c/dam/en/us/products/collateral/servers-unified-computing/ucs-b-series-blade-servers/b200m4-specsheet.pdf
shows how to populate the Dimm's
01-09-2017 10:22 AM
They are populated as specified on P15 of that link.
Memory was installed by a Cisco engineer. Everything is working barring these errors which do not flag up in the UCS manager.
01-09-2017 09:47 PM
I would open a TAC case to findout, if this is a real recoverable error, or simply a cosmetic issue.
01-10-2017 06:56 AM
I've done that.
The first line support person said it was normal and referred me to a white paper from Cisco which shows it is normal.
I highly doubt that...he got me to issue a few reset commands but the SEL logs still get populated by many correctable errors.
01-10-2017 11:30 AM
You should escalate the TAC case (first line support might not be happy).
Fact is you don't see any memory errors in UCSM ! only SEL, which is hardware monitoring.
01-10-2017 11:39 AM
Correctable ECC Memory errors are acceptable and do not indicate a hardware issue, especially since you're only seeing single digit ECC errors.
If you were seeing Uncorrectable ECC errors, that's cause for concern and would warrant a TAC Case.
-AJ Felts
Cisco TAC
01-10-2017 11:53 AM
So on three blades, in banks A through H EVERY slot 2 is reporting ECC correctable errors that have 300 entries per day for each blade in the SEL logs is normal?
Whilst NONE in slot 1 report any issues, correctable or uncorrectable.
The sample I posted are single digit but that is not the full report, some of them report up to 80 correctable errors on a single DIMM.
I suspect that this cannot be right at all.
01-10-2017 11:57 AM
Please review the following documentation:
http://www.cisco.com/c/dam/en/us/support/docs/servers-unified-computing/ucs-b-series-blade-servers/ManagingCorrectableMemoryErrorsFinalApril82016.pdf
AJ Felts
-Cisco TAC
01-10-2017 12:25 PM
Thanks, my TAC engineer already gave me that.
I'm still struggling to believe that 8 DIMMs, all in slot 2's, across all thee blades exhibit ECC errors and slots 1 do not.
01-10-2017 11:44 AM
Your engineer is correct, this is normal and not cause for concern as they are Correctable ECC errors.
If you were seeing Uncorrectable ECC errors, that's cause for concern, however based off the log file excerpt you've provided in the initial description, that's not the case.
-AJ Felts
Cisco TAC
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide