03-04-2011 05:21 PM - edited 03-01-2019 09:51 AM
Hello,
While I'm waiting for TAC and Development on this, I wanted to see if anyone else might've seen this behavior. We have a new UCS Chassis with 3 B230 M1 blades, fully populated with memory.
This past weekend, one of the blades started reporting ECC errors on two of the DIMM's. TAC decided to replace them. The replacement DIMM's didn't work and the entire bank showed up as inoperable. The same slots showed up as inoperable regardless of what DIMM's were put into it (even DIMM's from another, known working slot.)
A motherboard swap was done, after which the same 4 DIMM slots still showed as inoperable and another bank went into disabled state.
After that, TAC sent out a new set of everything (processors, DIMM's, interface card, motherboard.) The memory slots that reported previously as disabled or inoperable were fine, but an entirely different set of 4 DIMM's went into an inoperable state. It also didn't matter what known working DIMM's we put into those slots. The same slots always showed up as inoperable (BIOS shows failed.)
We have also tried a different position in the chassis for the blade but got the same results.
We're already running 1.4.1j.
Again, just curious if anyone else has experienced this and found a fix.
Thanks in advance,
Lewis Benton
03-04-2011 08:57 PM
Are all the DIMMs the same? What DIMMs are you using? What BIOS version it loaded on the motherboard?
Sent from Cisco Technical Support iPhone App
03-05-2011 09:16 AM
All DIMM's are 8GB. TAC made sure they sent us the correct DIMM's. They actually sent double the amount we needed to replace every single DIMM.
BIOS is B230M1.1.4.1c.0.120820101441 (which is in the 1.4.1j package)
Thanks,
Lewis
03-07-2011 05:37 PM
Is your Board Controller running B2301009?
03-07-2011 05:51 PM
B230100A
I should note, I updated to 1.4.1(m) this morning, just to see if it would help.
After the BIOS update to 1.4.1d.0.021720111340 was done on the blade, it saw all the DIMM's as installed (UCSM still showed them as failed until I reset the memory errors.) I rebooted and powered off the server several times and they still show up.
I'm hoping to hear back from TAC and Dev if that should've fixed it. I don't think that was expected.
03-08-2011 02:39 PM
This may be an issue with DIMM training during BIOS...
03-14-2011 04:34 AM
Quick update, one week past updating to 1.4.1(m) and the memory still looks good.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide