cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3010
Views
10
Helpful
3
Replies

Cluster state: warning F0865 device shared storage error

Walter Dey
VIP Alumni
VIP Alumni

I see a Warning message in UCSM, and find

fltMgmtEntityDevice-3-shared-storage error

Fault Code:F0865

Message

device [chassis3], error accessing shared-storage

Explanation

This fault occurs in an unlikely event that the shared storage selected for writing the cluster state is not accessible. This impacts the full HA functionality of the fabric interconnect cluster.

Recommended Action

If you see this fault, create a show tech-support file and contact Cisco TAC.lt;br/

US-CUCSFISF00-01-A# sho cluster extended-state

Cluster Id: 0xa5b70ab0a1ab11df-0xa46300059b756e44

 

Start time: Thu Sep 10 15:49:38 2015

Last election time: Fri Dec 11 09:31:45 2015

 

A: UP, SUBORDINATE

B: UP, PRIMARY

 

A: memb state UP, lead state SUBORDINATE, mgmt services state: UP

B: memb state UP, lead state PRIMARY, mgmt services state: UP

   heartbeat state PRIMARY_OK

 

INTERNAL NETWORK INTERFACES:

eth1, UP

eth2, UP

 

HA READY

Detailed state of the device selected for HA storage:

Chassis 1, serial: FOX1404GBJX, state: active

Chassis 2, serial: FOX1409G10Z, state: active

Chassis 3, serial: FOX1438GA40, state: active with errors

 

Fabric B, chassis-seeprom local IO failure:

FOX1438GA40 READ_FAILED, error: TIMEOUT, error code: 10, error count: 8

Warning: there are pending I/O errors on one or more devices, failover may not complete

 

 Any advice ?

3 Replies 3

Keny Perez
Level 8
Level 8

Walter,

To begin, shared storage is on each chassis midplane, that is why you see 3 chassis in the "sh cluster extended" command and a max of 3 chassis at the bottom of the output, plus the link you checked states "This fault occurs in an unlikely event that the shared storage selected for writing the cluster state is not accessible"

Since chassis 3 is the one that says "active with errors", that is the one we'll investigate...

Open 2 SSH sessions and do "connect local a" in one and "connect local b" in the other one, then: 

#connect iom 3  << This will connect to the IOM each FI is attached too

#show platform soft cmc showi2c  << this will show you the I2C bus

Paste the output here and we can check it.

More info: http://www.slideshare.net/AvinashSingal/my-i2c

Another thread where we did the same: https://supportforums.cisco.com/discussion/12133126/ucs-chassis-fans

-Kenny

Qiese Dides
Cisco Employee
Cisco Employee

What UCS Version are you currently running Walter?

Also Does the error look like the following message below;

affected object: sys/mgmt-entity-B
code: E4196536
cause: device-shared-storage-IO
Description: device Serial Number, error accessing shared storage

If that is the error there is no need to worry a lot of people see this and think it is storage related it is not. The shared storage errors are related to the IOMs not able to communicate to the SEEPROM. This error happens based on the Hardware design of the SEEPROMs and the chassis.

However, the bug related to this defect is;

https://tools.cisco.com/bugsearch/bug/CSCtu17144/?reffering_site=dumpcr

There is two possible scenarios when this happens either;

  1. A) If the error accessing shared-storage fault is currently in cleared state and does not raise again, do not apply the work around and do not do anything.
  2. B) If the error accessing shared-storage fault is raised state and is never cleared, or the fault keeps flip - comes (raised) and goes (cleared), try the following:
  3. SSH to your Fabric Interconnect
  4. Switch to side A local-management:

connect local-mgmt a

  1. Connect to side A IOM of your chassis:

connect iom <chassis #>

  1. Enter this command to find out if it's active or not:

show platform soft cmc thermal status | grep status:

=> If it says PASSIVE, you need to restart this IOM (IOM 1 on fabric A)
via UCSM, if it says ACTIVE, it's the other side (IOM 2 on fabric B) that
needs to be reset.

  1. To do the reset, please just go Equipment > Chassis > Chassis <#> > IO
    Modules > IO Module 1/2 > General tab > Reset IO Module

If this doesn't work the full work around would be below;

Remove PSU1 let sit for 2 minutes replace, wait 10 seconds confirm PSU1 has power, Move to PSU2
Remove PSU2 let sit for 2 minutes replace, wait 10 seconds has power, Move to PSU3
Remove PSU3 let sit for 2 minutes replace, wait 10 seconds PSU3 has power, Move to PSU4
Remove PSU4 let sit for 2 minutes replace, wait 10 seconds PSU4 has power, Move to Fan1

Remove Fan1 let sit for 30 seconds replace, wait 10 seconds confirm Fan1 has power, Move to Fan2
Remove Fan2 let sit for 30 seconds replace, wait 10 seconds confirm Fan2 has power, Move to Fan3
Remove Fan3 let sit for 30 seconds replace, wait 10 seconds confirm Fan3 has power, Move to Fan4
Remove Fan4 let sit for 30 seconds replace, wait 10 seconds confirm Fan4 has power, Move to Fan5
Remove Fan5 let sit for 30 seconds replace, wait 10 seconds confirm Fan5 has power, Move to Fan6
Remove Fan6 let sit for 30 seconds replace, wait 10 seconds confirm Fan6 has power, Move to Fan7
Remove Fan7 let sit for 30 seconds replace, wait 10 seconds confirm Fan7 has power, Move to Fan8
Remove Fan8 let sit for 30 seconds replace, wait 10 seconds confirm Fan8 has power, Move to IO MOD1

Remove IO Mod 1 let sit for 5 minutes replace, confirm that IO MOD is UP and Running before you reseat IOMOD 2
Once IO MOD1 is Up and Running finally reseat IO MOD 2 let sit for 5 minutes, and place it back into the chassis.

This is the complete reseat process to clear the i2c bus.

Keny Perez
Level 8
Level 8

Was this info helpful Walter?

-Kenny

Review Cisco Networking for a $25 gift card

Review Cisco Networking for a $25 gift card