08-19-2024 11:26 AM - edited 08-19-2024 11:29 AM
DIMMs A0 & A1 failed on my B200-M3 blade. I replaced the DIMMs with new ones. However, the 'Faults' tab in UCS Mgr still shows them as inoperable and the 'Effective memory' value on the 'General' tab is still lower than it should be.
I see this in the SEL logs:
6 | 07/26/2024 14:30:29 EDT | BIOS | System Firmware Progress #0x06 | System Firmware error | CPU1 DIMM A0 Memory Failed (Blacklisted by BMC). | Asserted
7 | 07/26/2024 14:30:29 EDT | BIOS | System Firmware Progress #0x06 | System Firmware error | CPU1 DIMM A1 Memory Failed (Blacklisted by BMC). | Asserted
8 | 07/26/2024 14:30:32 EDT | CIMC | Entity presence BIOS_POST_CMPLT #0x4b | Device Present | Asserted
9 | 08/12/2024 21:16:48 EDT | CIMC | Memory DDR3_P1_A0_ECC #0x72 | Upper Non-recoverable - going high | Deasserted | Reading 0 <= Threshold 60250 error
a | 08/12/2024 21:16:48 EDT | CIMC | Memory DDR3_P1_A1_ECC #0x73 | Upper Non-recoverable - going high | Deasserted | Reading 0 <= Threshold 60250 error
b | 08/12/2024 21:16:48 EDT | CIMC | Platform alert LED_BLADE_STATUS #0x95 | LED color is green | Asserted
Lines '9' & 'a' imply that the new DIMMs are fine and lines '6' & '7' seem to imply that UCS isn't allowing them to operate because those two slots have been 'blacklisted'. Is that an accurate inference on my part? If so, is there a way to 'unblacklist' those ports?
Note: I cleared the memory errors at the CLI (scope server [chassis]/[server] -> reset-all-memory-errors -> commit) but that didn't change anything.
Solved! Go to Solution.
08-19-2024 11:38 PM
- This could be related : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvs31291
+ Also have a look at https://davidring.ie/2014/07/11/cisco-ucs-blades-memory-troubleshooting/
M.
08-19-2024 11:38 PM
- This could be related : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvs31291
+ Also have a look at https://davidring.ie/2014/07/11/cisco-ucs-blades-memory-troubleshooting/
M.
08-20-2024 02:56 PM
That second link suggested resetting the CIMC and re-acknowledging the server. I tried that and it fixed it - the slots are now recognized and the RAM is back to its normal level.
Thank you so much!!
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide