We had this issue recently

Cisco-Hta · ‎11-23-2016

Hi,

we have 02 Fi 6248 configured in cluster mode, from last night we have all ports in the first FI are down and when i check i have :

show cluster extended-state
Cluster Id: 0xd65744c6a6b311e3-0xa438002a6a6afdc4

Start time: Wed Nov 23 09:42:26 2016
Last election time: Wed Nov 23 09:43:44 2016

A: UP, SUBORDINATE
B: UP, PRIMARY

A: memb state UP, lead state SUBORDINATE, mgmt services state: UP
B: memb state UP, lead state PRIMARY, mgmt services state: UP
heartbeat state PRIMARY_OK

INTERNAL NETWORK INTERFACES:
eth1, UP
eth2, UP

HA NOT READY
No device connected to this Fabric Interconnect
Detailed state of the device selected for HA storage:
Chassis 1, serial: FOX1730G5TP, state: inactive
Chassis 2, serial: FOX1730G5UE, state: inactive

show tech-support ethport

Ethernet VLAN Type Mode Status Reason Speed Port
Interface Ch #
--------------------------------------------------------------------------------
Eth1/1 4044 eth trunk down Hardware failure 10G(D) --
Eth1/2 4044 eth trunk down Hardware failure 10G(D) --
Eth1/3 4044 eth trunk down Hardware failure 10G(D) --
Eth1/4 4044 eth trunk down Hardware failure 10G(D) --
Eth1/5 1 eth access down Hardware failure 10G(D) --
Eth1/6 1 eth access down Hardware failure 10G(D) --
Eth1/7 1 eth access down Hardware failure 10G(D) --
Eth1/8 1 eth access down Hardware failure 10G(D) --
Eth1/9 1 eth access down Hardware failure 10G(D) --
Eth1/10 1 eth access down Hardware failure 10G(D) --
Eth1/11 1 eth access down Hardware failure 10G(D) --
Eth1/12 1 eth access down Hardware failure 10G(D) --
Eth1/13 1 eth access down Hardware failure 10G(D) --
Eth1/14 1 eth access down Hardware failure 10G(D) --
Eth1/15 1 eth trunk down Hardware failure 10G(D) 10
Eth1/16 1 eth trunk down Hardware failure 10G(D) 10
Eth1/17 4044 eth trunk down Hardware failure 10G(D) --
Eth1/18 4044 eth trunk down Hardware failure 10G(D) --
Eth1/19 4044 eth trunk down Hardware failure 10G(D) --
Eth1/20 4044 eth trunk down Hardware failure 10G(D) --
Eth1/21 4044 eth trunk down Hardware failure 10G(D) --
Eth1/22 4044 eth trunk down Hardware failure 10G(D) --
Eth1/23 4044 eth trunk down Hardware failure 10G(D) --
Eth1/24 4044 eth trunk down Hardware failure 10G(D) --
Eth1/25 4044 eth trunk down Hardware failure 10G(D) --
Eth1/26 4044 eth trunk down Hardware failure 10G(D) --
Eth1/27 4044 eth trunk down Hardware failure 10G(D) --
Eth1/28 4044 eth trunk down Hardware failure 10G(D) --

does any one have this problem, thank you.

Walter Dey · ‎11-23-2016

It seems that there is a problem with the quorum, meaning the communication FI - IOM in the chassis is not ok.

Did you check with UCSM that you see your chassis ?

Cisco-Hta · ‎11-23-2016

yes, I checked, all the chassis are Ok with the second FI, the problem is all the ports are in hardware failure, even the one with no connection .

Cisco-Hta · ‎11-23-2016

here screen shots

Walter Dey · ‎11-23-2016

Which UCS version ?

Did this errors happen out of the blue, or did you make some configuration changes ? power fail ?

I would try booting the failing FI.

Cisco-Hta · ‎11-23-2016

version Version 2.2(1c), I did not any changes, this happen out of the blue, I tried to reboot the failing FI but the same result .

Walter Dey · ‎11-23-2016

Hi

I woud open a TAC case. Could really be a hardware issue, requiring a RMA.

Don't know if this is a productive environment ?

You could disconnct the heartbeat, erase the configuration of the failing FI, and then to rejoin the cluster, letting the healthy FI to do the synch.

Cisco-Hta · ‎11-24-2016

I tried to erase the configuration and add it to the cluster again, but the same thing .

Start time: Thu Nov 24 14:56:00 2016
Last election time: Thu Nov 24 14:57:06 2016

A: UP, SUBORDINATE
B: UP, PRIMARY

A: memb state UP, lead state SUBORDINATE, mgmt services state: UP
B: memb state UP, lead state PRIMARY, mgmt services state: UP
heartbeat state PRIMARY_OK

INTERNAL NETWORK INTERFACES:
eth1, UP
eth2, UP

HA NOT READY
No device connected to this Fabric Interconnect
Detailed state of the device selected for HA storage:
Chassis 1, serial: FOX1730G5TP, state: inactive
Chassis 2, serial: FOX1730G5UE, state: inactive

Fabric B, chassis-seeprom local IO failure:
FOX1730G5TP READ_FAILED, error: TIMEOUT, error code: 10, error count: 4
Fabric B, chassis-seeprom local IO failure:
FOX1730G5UE READ_FAILED, error: TIMEOUT, error code: 10, error count: 7
Warning: there are pending I/O errors on one or more devices, failover may not complete

Walter Dey · ‎11-24-2016

Could it be https://bst.cloudapps.cisco.com/bugsearch/bug/CSCul44421 ?

Can you upgrade at least the infrastructure to a newer release 2.2.8 ?

Cisco-Hta · ‎12-19-2016

Hi, I updated the FI to 3.1.2b.A, the problem seems to be resolved.

I am still monitoring .

thank you .

ssumichrast · ‎11-23-2016

We had this issue recently bringing up a domain that was only rack servers, no chassis. I suspect something went wrong because the domain initially had a chassis and then had it removed.

We we ended up wiping the domain and starting over since it was new and that fixed our issue.

Other things to check:

- Default Gateway is reachable via ICMP

- Chassis is properly connected -- 1 IOM to one FI

- Twinax is not defective (ran into bad twinax multiple times now)

- Reboot both FIs

ucs 6248 HA NOT READY