cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements
Field Notice 70545
3801
Views
0
Helpful
10
Replies
Cisco-Hta
Beginner

ucs 6248 HA NOT READY

Hi, 

we have 02 Fi 6248 configured in cluster mode, from last night we have all ports in the first FI are down and when i check i have :

show cluster extended-state
Cluster Id: 0xd65744c6a6b311e3-0xa438002a6a6afdc4

Start time: Wed Nov 23 09:42:26 2016
Last election time: Wed Nov 23 09:43:44 2016

A: UP, SUBORDINATE
B: UP, PRIMARY

A: memb state UP, lead state SUBORDINATE, mgmt services state: UP
B: memb state UP, lead state PRIMARY, mgmt services state: UP
heartbeat state PRIMARY_OK

INTERNAL NETWORK INTERFACES:
eth1, UP
eth2, UP

HA NOT READY
No device connected to this Fabric Interconnect
Detailed state of the device selected for HA storage:
Chassis 1, serial: FOX1730G5TP, state: inactive
Chassis 2, serial: FOX1730G5UE, state: inactive

show tech-support ethport

Ethernet VLAN Type Mode Status Reason Speed Port
Interface Ch #
--------------------------------------------------------------------------------
Eth1/1 4044 eth trunk down Hardware failure 10G(D) --
Eth1/2 4044 eth trunk down Hardware failure 10G(D) --
Eth1/3 4044 eth trunk down Hardware failure 10G(D) --
Eth1/4 4044 eth trunk down Hardware failure 10G(D) --
Eth1/5 1 eth access down Hardware failure 10G(D) --
Eth1/6 1 eth access down Hardware failure 10G(D) --
Eth1/7 1 eth access down Hardware failure 10G(D) --
Eth1/8 1 eth access down Hardware failure 10G(D) --
Eth1/9 1 eth access down Hardware failure 10G(D) --
Eth1/10 1 eth access down Hardware failure 10G(D) --
Eth1/11 1 eth access down Hardware failure 10G(D) --
Eth1/12 1 eth access down Hardware failure 10G(D) --
Eth1/13 1 eth access down Hardware failure 10G(D) --
Eth1/14 1 eth access down Hardware failure 10G(D) --
Eth1/15 1 eth trunk down Hardware failure 10G(D) 10
Eth1/16 1 eth trunk down Hardware failure 10G(D) 10
Eth1/17 4044 eth trunk down Hardware failure 10G(D) --
Eth1/18 4044 eth trunk down Hardware failure 10G(D) --
Eth1/19 4044 eth trunk down Hardware failure 10G(D) --
Eth1/20 4044 eth trunk down Hardware failure 10G(D) --
Eth1/21 4044 eth trunk down Hardware failure 10G(D) --
Eth1/22 4044 eth trunk down Hardware failure 10G(D) --
Eth1/23 4044 eth trunk down Hardware failure 10G(D) --
Eth1/24 4044 eth trunk down Hardware failure 10G(D) --
Eth1/25 4044 eth trunk down Hardware failure 10G(D) --
Eth1/26 4044 eth trunk down Hardware failure 10G(D) --
Eth1/27 4044 eth trunk down Hardware failure 10G(D) --
Eth1/28 4044 eth trunk down Hardware failure 10G(D) --

 

does any one have this problem, thank you.

10 REPLIES 10
Walter Dey
Advocate

It seems that there is a problem with the quorum, meaning the communication FI - IOM in the chassis is not ok.

Did you check with UCSM that you see your chassis ?

yes, I checked, all the chassis are Ok with the second FI, the problem is all the ports are in hardware failure, even the one with no connection .

here screen shots 

Which UCS version ?

Did this errors happen out of the blue, or did you make some configuration changes ? power fail ?

I would try booting the failing FI.

version Version 2.2(1c), I did not any changes, this happen out of the blue, I tried to reboot the failing FI but the same result .

Hi

I woud open a TAC case. Could really be a hardware issue, requiring a RMA.

Don't know if this is a productive environment ?

You could disconnct the heartbeat, erase the configuration of the failing FI, and then to rejoin the cluster, letting the healthy FI to do the synch.

I tried to erase the configuration and add it to the cluster again, but the same thing .


Start time: Thu Nov 24 14:56:00 2016
Last election time: Thu Nov 24 14:57:06 2016

A: UP, SUBORDINATE
B: UP, PRIMARY

A: memb state UP, lead state SUBORDINATE, mgmt services state: UP
B: memb state UP, lead state PRIMARY, mgmt services state: UP
heartbeat state PRIMARY_OK

INTERNAL NETWORK INTERFACES:
eth1, UP
eth2, UP

HA NOT READY
No device connected to this Fabric Interconnect
Detailed state of the device selected for HA storage:
Chassis 1, serial: FOX1730G5TP, state: inactive
Chassis 2, serial: FOX1730G5UE, state: inactive

Fabric B, chassis-seeprom local IO failure:
FOX1730G5TP READ_FAILED, error: TIMEOUT, error code: 10, error count: 4
Fabric B, chassis-seeprom local IO failure:
FOX1730G5UE READ_FAILED, error: TIMEOUT, error code: 10, error count: 7
Warning: there are pending I/O errors on one or more devices, failover may not complete

Could it be https://bst.cloudapps.cisco.com/bugsearch/bug/CSCul44421 ?

Can you upgrade at least the infrastructure to a newer release 2.2.8 ?

Hi, I updated the FI to 3.1.2b.A, the problem seems to be resolved.

I am still monitoring .

thank you .

ssumichrast
Beginner

We had this issue recently bringing up a domain that was only rack servers, no chassis. I suspect something went wrong because the domain initially had a chassis and then had it removed. 

We we ended up wiping the domain and starting over since it was new and that fixed our issue. 

Other things to check:

- Default Gateway is reachable via ICMP

- Chassis is properly connected -- 1 IOM to one FI

- Twinax is not defective (ran into bad twinax multiple times now)

- Reboot both FIs

Create
Recognize Your Peers
Content for Community-Ad