02-18-2016 02:30 AM - edited 03-01-2019 12:36 PM
I see a Warning message in UCSM, and find
fltMgmtEntityDevice-3-shared-storage error
Fault Code:F0865
Message
device [chassis3], error accessing shared-storage
Explanation
This fault occurs in an unlikely event that the shared storage selected for writing the cluster state is not accessible. This impacts the full HA functionality of the fabric interconnect cluster.
Recommended Action
If you see this fault, create a show tech-support file and contact Cisco TAC.lt;br/
US-CUCSFISF00-01-A# sho cluster extended-state
Cluster Id: 0xa5b70ab0a1ab11df-0xa46300059b756e44
Start time: Thu Sep 10 15:49:38 2015
Last election time: Fri Dec 11 09:31:45 2015
A: UP, SUBORDINATE
B: UP, PRIMARY
A: memb state UP, lead state SUBORDINATE, mgmt services state: UP
B: memb state UP, lead state PRIMARY, mgmt services state: UP
heartbeat state PRIMARY_OK
INTERNAL NETWORK INTERFACES:
eth1, UP
eth2, UP
HA READY
Detailed state of the device selected for HA storage:
Chassis 1, serial: FOX1404GBJX, state: active
Chassis 2, serial: FOX1409G10Z, state: active
Chassis 3, serial: FOX1438GA40, state: active with errors
Fabric B, chassis-seeprom local IO failure:
FOX1438GA40 READ_FAILED, error: TIMEOUT, error code: 10, error count: 8
Warning: there are pending I/O errors on one or more devices, failover may not complete
Any advice ?
02-18-2016 05:23 AM
Walter,
To begin, shared storage is on each chassis midplane, that is why you see 3 chassis in the "sh cluster extended" command and a max of 3 chassis at the bottom of the output, plus the link you checked states "This fault occurs in an unlikely event that the shared storage selected for writing the cluster state is not accessible"
Since chassis 3 is the one that says "active with errors", that is the one we'll investigate...
Open 2 SSH sessions and do "connect local a" in one and "connect local b" in the other one, then:
#connect iom 3 << This will connect to the IOM each FI is attached too
#show platform soft cmc showi2c << this will show you the I2C bus
Paste the output here and we can check it.
More info: http://www.slideshare.net/AvinashSingal/my-i2c
Another thread where we did the same: https://supportforums.cisco.com/discussion/12133126/ucs-chassis-fans
-Kenny
02-18-2016 06:39 AM
What UCS Version are you currently running Walter?
Also Does the error look like the following message below;
affected object: sys/mgmt-entity-B
code: E4196536
cause: device-shared-storage-IO
Description: device Serial Number, error accessing shared storage
If that is the error there is no need to worry a lot of people see this and think it is storage related it is not. The shared storage errors are related to the IOMs not able to communicate to the SEEPROM. This error happens based on the Hardware design of the SEEPROMs and the chassis.
However, the bug related to this defect is;
https://tools.cisco.com/bugsearch/bug/CSCtu17144/?reffering_site=dumpcr
There is two possible scenarios when this happens either;
connect local-mgmt a
connect iom <chassis #>
show platform soft cmc thermal status | grep status:
=> If it says PASSIVE, you need to restart this IOM (IOM 1 on fabric A)
via UCSM, if it says ACTIVE, it's the other side (IOM 2 on fabric B) that
needs to be reset.
If this doesn't work the full work around would be below;
Remove PSU1 let sit for 2 minutes replace, wait 10 seconds confirm PSU1 has power, Move to PSU2
Remove PSU2 let sit for 2 minutes replace, wait 10 seconds has power, Move to PSU3
Remove PSU3 let sit for 2 minutes replace, wait 10 seconds PSU3 has power, Move to PSU4
Remove PSU4 let sit for 2 minutes replace, wait 10 seconds PSU4 has power, Move to Fan1
Remove Fan1 let sit for 30 seconds replace, wait 10 seconds confirm Fan1 has power, Move to Fan2
Remove Fan2 let sit for 30 seconds replace, wait 10 seconds confirm Fan2 has power, Move to Fan3
Remove Fan3 let sit for 30 seconds replace, wait 10 seconds confirm Fan3 has power, Move to Fan4
Remove Fan4 let sit for 30 seconds replace, wait 10 seconds confirm Fan4 has power, Move to Fan5
Remove Fan5 let sit for 30 seconds replace, wait 10 seconds confirm Fan5 has power, Move to Fan6
Remove Fan6 let sit for 30 seconds replace, wait 10 seconds confirm Fan6 has power, Move to Fan7
Remove Fan7 let sit for 30 seconds replace, wait 10 seconds confirm Fan7 has power, Move to Fan8
Remove Fan8 let sit for 30 seconds replace, wait 10 seconds confirm Fan8 has power, Move to IO MOD1
Remove IO Mod 1 let sit for 5 minutes replace, confirm that IO MOD is UP and Running before you reseat IOMOD 2
Once IO MOD1 is Up and Running finally reseat IO MOD 2 let sit for 5 minutes, and place it back into the chassis.
This is the complete reseat process to clear the i2c bus.
02-22-2016 05:32 AM
Was this info helpful Walter?
-Kenny
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide