cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1622
Views
0
Helpful
10
Replies

UCS Mini - Three Cico Blades - All report strange ECC memory issues in SEL Logs

spphelpdesk
Level 1
Level 1

I have a Cisco UCS Mini Blade enter with two B200-M4's and one C240-M4.  All three Blades have memory installed in the first two DIM slots ie A1 and A2 but not A3....H1 and H2 but not H3 etc.

All three Blades have ECC memory errors logging in the SEL Logs.

The rather curious thing is all three blades report these, and for every DIMM, but ONLY for DIMM's in slot 2. There is not one ECC error logged at all in any of the three Blades for a DIMM in slot 1.

Like this:

41 | 01/09/2017 10:30:37 | CIMC | Memory DDR4_P1_A2_ECC #0xa4 | read 1 correctable ECC errors on CPU1 DIMM A2 | Asserted
42 | 01/09/2017 10:31:01 | CIMC | Memory DDR4_P1_A2_ECC #0xa4 | read 2 correctable ECC errors on CPU1 DIMM A2 | Asserted
43 | 01/09/2017 10:31:12 | CIMC | Memory DDR4_P1_A2_ECC #0xa4 | read 2 correctable ECC errors on CPU1 DIMM A2 | Asserted
44 | 01/09/2017 10:32:21 | CIMC | Memory DDR4_P1_B2_ECC #0xa7 | read 1 correctable ECC errors on CPU1 DIMM B2 | Asserted

I'm getting these errors logged every few minutes.

I can't imagine ALL DIMM's in Slot 2 of banks A through H on ALL three Blades need replacing.

Has anyone seen this before?  Any recommendation before I log a support call?

Firmware on our UCS is 3.1(2c).

1 Accepted Solution

Accepted Solutions

Your engineer is correct, this is normal and not cause for concern as they are Correctable ECC errors. 

If you were seeing Uncorrectable ECC errors, that's cause for concern, however based off the log file excerpt you've provided in the initial description, that's not the case.

-AJ Felts

Cisco TAC

View solution in original post

10 Replies 10

Walter Dey
VIP Alumni
VIP Alumni

was this memory preinstalled ? or did you do it ?

p15 of the specs

http://www.cisco.com/c/dam/en/us/products/collateral/servers-unified-computing/ucs-b-series-blade-servers/b200m4-specsheet.pdf

shows how to populate the Dimm's

They are populated as specified on P15 of that link.

Memory was installed by a Cisco engineer.  Everything is working barring these errors which do not flag up in the UCS manager.

I would open a TAC case to findout, if this is a real recoverable error, or simply a cosmetic issue.

I've done that.

The first line support person said it was normal and referred me to a white paper from Cisco which shows it is normal.

I highly doubt that...he got me to issue a few reset commands but the SEL logs still get populated by many correctable errors.

You should escalate the TAC case (first line support might not be happy).

Fact is you don't see any memory errors in UCSM ! only SEL, which is hardware monitoring.

Correctable ECC Memory errors are acceptable and do not indicate a hardware issue, especially since you're only seeing single digit ECC errors. 

If you were seeing Uncorrectable ECC errors, that's cause for concern and would warrant a TAC Case.

-AJ Felts

Cisco TAC

So on three blades, in banks A through H EVERY slot 2 is reporting ECC correctable errors that have 300 entries per day for each blade in the SEL logs is normal?

Whilst NONE in slot 1 report any issues, correctable or uncorrectable.

The sample I posted are single digit but that is not the full report, some of them report up to 80 correctable errors on a single DIMM.

I suspect that this cannot be right at all.

Please review the following documentation:

http://www.cisco.com/c/dam/en/us/support/docs/servers-unified-computing/ucs-b-series-blade-servers/ManagingCorrectableMemoryErrorsFinalApril82016.pdf

AJ Felts

-Cisco TAC

Thanks, my TAC engineer already gave me that.

I'm still struggling to believe that 8 DIMMs, all in slot 2's, across all thee blades exhibit ECC errors and slots 1 do not.

Your engineer is correct, this is normal and not cause for concern as they are Correctable ECC errors. 

If you were seeing Uncorrectable ECC errors, that's cause for concern, however based off the log file excerpt you've provided in the initial description, that's not the case.

-AJ Felts

Cisco TAC

Review Cisco Networking products for a $25 gift card