Just trying to get to the bottom of what this requires to fix. I understand what it''s telling me and I was just going to reset the CIMC but on investigation I am a little confused..
It states in the early paragraphs that once the memory is degraded it will no longer get re-evaluated until changed even if you perform a CIMC reset - but then later states that you can indeed force re-evaluation by resetting the CIMC ??
So are they saying that once you see this error the threshold has been reached and you need new RAM as the current RAM has performed below expectations - or reset the CIMC and see if it breaches the threshold again (as it has been re-set) ?
My worry is that I reset the CIMC but the ECC threshold is no longer being evaluated and the DIMM fails fully.
Just to clarify: reset memory errors does not equal reset CIMC.
Actually resetting CIMC should never be done to clear DIMM errors - doing so is equivalent to sweeping a potential problem under the rug and has a side effect of deleting files in CIMC that may be helpful in investigating the cause of the error.
In 1.3 and earlier firmware resetting CIMC was the easiest way to get UCSM to re-evaluate the DIMM status based on what it was seeing from CIMC (another more impacting method would be to decommision and reack the blade). For errors that do not occur frequently this could result in the DIMM status being reset to operable in UCSM without much impact on the operation of the system but if the error returned what have you accomplished?
This behavior changed in 1.4 firmware and later. In 1.4 and later resetting CIMC has no affect on the DIMM status in UCSM. Once a DIMM goes degraded or inoperable the only way to clear that state in UCSM is to change the FRU information on the DIMM (i.e. replace it), decommision and reack the server (i.e. the server starts over from scratch) or use the reset memory errors functionality.
Reset memory errors was added to 1.4 and later firmware because in 1.3 firmware, UCSM essentially ignored correctable errors. During testing of upgrades from 1.3 to 1.4 it was found that if a system had many correctable errors that occurred long ago, once UCSM was upgraded it would suddenly see all those historical correctable errors as new ones and set the DIMM status to degraded. Reset errors was added to clear that specific condition as well as clear any other false positive DIMM degraded or inoperable status. Use of reset errors outside of this context is similar to resetting CIMC - sweeping a potential problem under the rug.
Regarding your specific problem - if the number of correctable errors continues to increase then yes, the recommended course of action would be as you suggest - replace the DIMM.
I was looking for a solution to automate the control of my home VMware servers to save on the power draw by booting them only when needed (vs leaving them running 24x7). The average UCS C-series server costs between $40 to $60 per month to run continuous...
Hi,Working on HX Edge install. Receiving this error VMkernel gateway address to xx.xx.xx.xx as there are no VMkernel interfaces on the same network with that IP address.', One thing to point out is that the servers came with 1G and we replaced i...
Cisco UCS Platform Emulator, Release 4.1(2cPE1)
CONFIGURATION IMPORT NOTE: Importing configuration backups (All, System, or Logical) taken from the UCS Platform Emulator (UCSPE) to physical UCS Manager domains is not recommended or supported...
While this same information is already scattered around the internet, I wanted to present this in a format that is hopefully easily usable by someone that is not powershell savvy, and needs a quick way to track down duplicate MACs.
Apache Spark has been the de-facto standard and world’s leading data analytics platform for implementing data science and machine learning framework.
Spark 3.0, with native GPU support, is something that almost every data scientist and d...