UCSM HA functionality - quorum chassis

today I've testet HA functionality.

We have two FI and two chassis in a streched cluster. As I switched off the power of chassis 2 and the primary FI-B, the subordinated FI-A didn't switched to primary. UCSM was not reachable, 'cause the primary FI was down.

Chassis 1 and FI-A was online. I had do connect to FI-A CLI and force primary. After I did and FI-A has become primary, the UCSM was online again.

Can someone explain me, what happened and how I can fix this?

I read something about quorum chassis and if there are an even number of chassis as the same as in my case (2 chassis), one chassis will not be designated as a quorum chassis. So there only exists a odd number of quorum chassis to participate in the HA cluster.

If I show the cluster extended-state of my FI, both chassis are listed as active. So both are quorum servers? Where can I figure out the chassis, from which is the SEEPROM used for HA?

Which version of UCS ?

Did you do before the test, on each FI: CLI  "show cluster status / extended"

And after power off, the same on the surviving FI.

Also, do you have a management interface monitoring policy ?


I am using version 2.2.1(c).

Mgmt Int. Monitoring Policy is enabled; Ping Gateway, 90 sec, 3 faults.


I tested again with a serveral possibilitys and logging the cluster states.

HA is working, if Chassis 2 is online or it is shut down before the primary FI is losing power. Only if primary FI and Chassis 2 loses the power on the same time, the HA is not working.

So I think, if both chassis are online, chassis 2 is the only quorum chassis. If the quorum chassis and the primary FI fails at the same time, HA is not working, because the cluster state says chassis 2: state active with errors.




This must be a bug ! I would immediately raise a TAC case.

I wouldn't be surprised that this special case hasn't been tested.

What is the probability that primary FI AND quorum fails at the same time.

No excuse ! you are 100% right. I hope this will not be a business critical issue.

