06-07-2012 09:53 AM - edited 03-01-2019 10:27 AM
We have two different UCS systems that show chassis-seeprom errors (one is a brand new production system). E.g.,
cnat-pod1-ucs6248-A# show cluster state Cluster Id: 0xde5abc12999911e1-0xb8c9547fee4ca524
A: UP, PRIMARY
B: UP, SUBORDINATEHA READY
Detailed state of the device selected for HA storage:Chassis 1, serial: FOX1326G5KD, state: active with errors
Fabric A, chassis-seeprom local IO failure:
FOX1326G5KD READ_FAILED, error: TIMEOUT, error code: 10, error count: 37503
Fabric B, chassis-seeprom local IO failure:
FOX1326G5KD READ_FAILED, error: TIMEOUT, error code: 10, error count: 37504Warning: there are pending SEEPROM errors on one or more devices, failover may not complete
UCSM seems blissfully unaware of the error. It does occasionally log events:
cnat-pod1-ucs6248-A# show event | include shared-storage 2012-05-12T15:28:14.547 78004 E4196535 device FOX1326G5KD, error accessing shared-storage 2012-05-12T15:28:14.546 78002 E4196535 device FOX1326G5KD, error accessing shared-storage 2012-05-11T05:43:14.544 77962 E4196535 device FOX1326G5KD, error accessing shared-storage 2012-05-10T16:43:14.544 76405 E4196535 device FOX1326G5KD, error accessing shared-storage 2012-05-09T20:43:14.544 74748 E4196535 device FOX1326G5KD, error accessing shared-storage 2012-05-09T13:28:14.544 67198 E4196535 device FOX1326G5KD, error accessing shared-storage
or in the GUI --
But at least in my case, the error counters keep going up every few seconds, but there hasn't been an event for 3 weeks. Cisco's message/fault guide indicates that this event is very serious error and that TAC should be contacted. If that's the case, how come these are logged as events and not faults? How come there is no call-home trigger for this condition?
-Craig
06-07-2012 05:58 PM
There is a bug: CSCtu17144
The error accessing shared-storage is not harmful and does not affect the system operability
If the error counterts keep going up, try the following workaround:
1. Unplug the IO module.
2. Replug in the IO module. Make sure the module is in contact with the backplane firmly.
3. Reboot the IO module.
If after the workaround, the event keeps coming and going, you can leave it alone, since it does not hurt the system, but if it never clears it can be a chassis issue, so you may want to open a TAC case to confirm that.
07-09-2012 11:58 AM
Zaira,
Is your advice true if the error count is constantly increasing? Below is a related TAC note that one of my colleagues received:
Q: Why the error accessing shared-storage fault can happen and why it is not harmful.
A: In UCS chassis design, we build in a chip called, SEEPROM, on the backplane. SEEPROM is a permanent memory and used to store SAM DB version to avoid the case of SAM DB being overwritten by old version when failover happens. The communication between IO module and SEEPROM through a wiring on backplane is not repliable. To overcome this difficulty, we store the identical SAM DB version in three chassises rather than in one (so called three chassises redundancy). Because the communication between IO module and SEEPROM is not repliable, the error accessing shared-storage fault can happen sometimes - this is system behavior per specification and design. So, as long as one SEEPROM is readable, the UCS works normally. In your reported case, one chassis SEEPROM has read problem and the other two work. So, the error accessing shared-storage fault is not harmful and does not affect the system operability.
It seems that this isn't harmful if the error happens periodically. But what if the SEEPROM error count is increasing every minute? Doesn't that indicate that the SEEPRM can never be read? Wouldn't that be very serious if the UCS system had only one or two chassis?
07-09-2012 12:09 PM
cweinhold,
If after the workaround the fault is still there and never cleared, a chassis replacement could be considered, but before that TAC has to check the logs to confirm that.
07-09-2012 12:31 PM
Thanks for the reply.
One follow-up: how does UCSM handle SAM DB versioning and dual-active detection on a UCS system that has no chassis and is entirely used for C-Series integration? I.e., when there are no SEEPROM's.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide