cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
13068
Views
5
Helpful
1
Comments
CscTsWebDocs
Community Member
[Symptom]

If the fault event shown below occurs on UCSM, you may find from the CIMC log that a correctable ECC error has occurred on DIMM.

Severity: Minor
Code: F0184
Description: DIMM A2 on server X/Y operability: degraded
Name: Memory Unit Degraded
Cause: Equipment Degraded

 

The occurrence of the correctable ECC error means that the single bit error detected by data read from DIMM has been repaired.

Therefore, there will be no effect on OS behaviors because normal data has been passed onto the OS even though the correctable ECC error has occurred.

Please note that only the data read from DIMM will be repaired. Its original data on DIMM will not be repaired.

Thus, the data may cause a correctable ECC error again until the problematic data is overwritten by another set of data, or until you clear the data on DIMM by restarting the server.

 

[Troubleshooting]

When a correctable ECC error occurs, you may choose to ignore it because it will not affect the OS behaviors.

To handle it explicitly, perform the following procedure:

1. Restart the OS
2. Reset the DIMM counter on UCSM
3. Reset CIMC (If the error persists even after trying step 2)

Steps 2 and 3 will not affect the OS behaviors.

If you are unable to restart the OS immediately, please try steps 2 and 3 first.
If the error persists thereafter, restart the OS during the maintenance period and try steps 2 and 3 once again.

[How to reset the DIMM counter]

1. Access UCSM.
2. Go to [Chassis] > [Chassis x] > [Servers] > [Server y].
3. On the right side of the screen, go to [Inventory] > [Memory], then double-click the DIMM_xx memory applicable.
4. On another window displaying the memory details as a popup, click [Reset Memory Errors].

 

[How to reset CIMC]

1. Access UCSM.
2. Go to [Chassis] > [Chassis x] > [Servers] > [Server y].
3. On the [General] tab shown at the screen right, click [Recover Server].
4. On another window appearing as a popup, click [Reset CIMC].

 

[Troubleshooting for Standalone C-Series]

Basically, it will be the same troubleshooting as for B-Series.

1. Restart the OS
2. Reset CIMC

Resetting CIMC will not affect the OS behaviors.
If you are unable to restart the OS immediately, try resetting CIMC first.If the error persists thereafter, restart the OS during the maintenance period and try steps 2 and 3 once again.

[How to reset CIMC]

1. Access CIMC.
2. Go to [Admin] > [Utilities]. Click [Reboot CIMC].

Related Information

Original Document: https://supportforums.cisco.com/ja/document/12189356
Author: Akiyoshi Kawaguchi
Posted on April 29, 2014

Comments
AndrewCirel
Level 1
Level 1

Please see the following Cisco white paper on correctable memory errors: https://www.cisco.com/c/dam/en/us/products/collateral/servers-unified-computing/ucs-manager/whitepaper-c11-736116.pdf

This page was written before the update to the firmware mentioned in this Cisco white paper.

Is this page still relevant?  Can this page be updated with the HTML 5 screenshots.

If I see repeated correctable ECC errors on the same memory module, does that come from re-reading the errored bit on the memory again and again or is that because of new errored bits?

Can this

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: