Has anyone started testing Windows 2012 with their UCS? I have a few RTM servers up and running and I've seen a couple problems so far. I have them booting from SAN using B230M2 blades with the M81KR card.; that's working fine.
I ocassionally get hardware errors in the Windows system log:
Event ID 22
A fatal hardware error has occurred.
Error Source: BOOT
Error Type: 17
The details view of this entry contains further information.
I tried installing the chipset drivers but they fail during the install.
I am also seeing 1 of the 4 paths to the storage go down intermittently. I am troubleshooting this to see if it's a SAN, switch or server issue.
When I see the path go down I find a bunch of errors in the Windows system log:
Event ID 153
The IO operation at logical block address 147f8 for Disk 1 was retried.
We are running F/W 2.0(3a)
I'm running it in our lab now. I'm seeing some correctable memory errors but UCSM is reporting the same thing.
Are you seeing corresponding errors in UCSM when you get errors in Windows?
I'm running on a B200M1 and B200M2 with no issues. Currently I'm local boot but I did have SAN boot working with the RC release.
I do have a B230 but I won't be able to test with it until Monday.
If you could check the UCSM logs it would help to isolate the problem. Also what storage are you using? Array vendor and model.
I'm seeing events that I believe correspond with the loss of the path to the storage:
After a few of these the server reboots itself.
I'm not seeing anything in UCSM for the Windows memory errors though.
Our storage is a Dell Compellent SC040.
I'm going to do some comparrison testing at our DR site; we have an identical UCS and Compellet array there, but the Compellent is directly attached to the FI's. At our primary site the Compellent is attached to our Nexus 5020's. I don't thing the Nexus' are the cause of the storgae path issue; but it's worth a test.
Let me know how you make out with your B230 testing.
We just put together a Hyper-V cluster on IBM equipment (not my choice) and are having issues with the windows logs filling up with:
The IO operation at logical block address 65401b48 (*number vaires*) for Disk 1 was retried.
I do not think it is a Cisco problem, I beleive this is a Windows 2012 / 8 problem.
Regarding the memory errors (which I know are old and not the main focus of this discussion anymore), this is a bug with Windows 8 (and I imagine Windows 2012) that is trigging a latent bug to appear on UCS that are EX processor based like the B230 and B440. It is corrected in 2.1(1a) and later.
CSCub29699 - Win2k8 and B230 M2 generates memory fault on OS only.
After that bug was fixed another was found that is still under investgation but it appears to have a different root cause and solution. This bug is still internal unfortunately. However, if the memory error on boot continues after an upgrade to 2.1(1a) or later please open a TAC case so that the additional bug can be updated to show that customers have hit it and it will be made externally visible.
After the fix for CSCub29699 was implemented the frequency of these memory errors on boot went down significantly. The errors no longer occurred nearly 100% of the time and instead occurred one third of one percent of the time in testing.