Unable to communicate with UCSM controller - 2.2(1b)

Luke Poskitt · ‎02-06-2014

Hi All,

On upgrading from v2.1(1e) to 2.2(1b) we have seen a similar issue on two different clusters.

After activating the new image on the subordinate FI and attempting to switch the primary role to the subordinate, a 'show cluster state' reports that the primary is unable to communicate with the UCSM controller.

ucs-A# show fabric-interconnect version

Fabric Interconnect A:

Running-Kern-Vers: 5.0(3)N2(2.11e)

Running-Sys-Vers: 5.0(3)N2(2.11e)

Package-Vers: 2.1(1e)A

Startup-Kern-Vers: 5.0(3)N2(2.11e)

Startup-Sys-Vers: 5.0(3)N2(2.11e)

Act-Kern-Status: Ready

Act-Sys-Status: Ready

Bootloader-Vers: v1.5.0(11/30/10)

Fabric Interconnect B:

Running-Kern-Vers: 5.2(3)N2(2.21b)

Running-Sys-Vers: 5.2(3)N2(2.21b)

Package-Vers: 2.2(1b)A

Startup-Kern-Vers: 5.2(3)N2(2.21b)

Startup-Sys-Vers: 5.2(3)N2(2.21b)

Act-Kern-Status: Ready

Act-Sys-Status: Ready

Bootloader-Vers: v1.5.0(11/30/10)

ucs-A# show system version

UCSM:

Running-Vers: 2.2(1b)

Package-Vers: 2.2(1b)A

Activate-Status: Ready

ucs-A# show cluster state

Cluster Id: 65030500-7707-11e0-87e4-000573d2eec4

Unable to communicate with UCSM controller

Has anyone seen similar issues in upgrading to v2.2(1b).

Walter Dey · ‎02-06-2014

I assume you did a manual install (not infrastructure autoinstall) ?

Is your UCSM already upgraded and running 2.2.1b ? If not, this might be your problem ?

Luke Poskitt · ‎02-07-2014

Yes, this was a manual upgrade.

As you can see from the output in my original post, the UCSM is upgraded to 2.2(1b) (from 2.2(1e)), and the subordinate FI has been upgraded to 2.2(1b), however on attempting to force the subordinate to lead so that the primary FI can be upgraded, running a 'show cluster state', or 'force lead b' on the primary returns an error.

Walter Dey · ‎02-07-2014

Just try to install the other FI !

Luke Poskitt · ‎02-10-2014

I've been advised that activating the newer kernel and system image on the primary will cause a management state failover, however it seems to go against all best practices outlined in the upgrade guide, especially those pertaining to confirmation of HA state prior to proceeding with FI upgrades.

To have the primary FI report that it is unable to communicate with the UCSM controller seems to me to be a fairly serious issue.

Walter Dey · ‎02-11-2014

Questions / comments

- is this a production environment ?

- very strange, that you see this issue in 2 UCS domains ?

- what kind of other error messages do you see; eg. are the IOM's all ok ?

- I would anyway advise to use "autoinstall feature"

- I have seen tons of transient error messages during upgrades from 2.0.x / 2.1.x to 2.2.1b, which disappeared at the end

- force the subordinate to lead so that the primary FI can be upgraded is optional; it will be done automatically if you upgrade the master FI ! there is no benefit at all; you loose your UCSM session anyway !

Luke Poskitt · ‎02-11-2014

- We have seen this happen in both a test and production environment.

- We have seen this happen in two domains.

- All other components are OK.

- We are somewhat hesitant to use the autoinstall feature as we want to avoid any possibility of unscheduled reboots.

- My interpretation of the upgrade documentation is that forcing the subordinate to lead prior to upgrading the master is best practice and should be done to confirm HA as a preliminary step to upgrading the master. I have no idea how the autoinstall feature is implemented internally, but one may well suspect it confirms HA status/cluster state on the master via a similar process prior to upgrading the FI.

I understand that the master role should (and almost certainly will) fail-over if the master is rebooted, but the inability to query the cluster state from the master FI, and relying on hard failover during an upgrade, still seems like a glaring issue to me.

Walter Dey · ‎02-12-2014

my comments

- we have seen issues upgrading to 2.2.1b; almost 100% of those were manual

- infrastructure auto install order is:

1) UCSM (you loose your UCSM session)

2) IOM upgrade and activation with flag set

3) FI subordinate upgrade (with reboot)

4) the install stops !!!!! you have to check if both fabrics are properly up and running, and cluster state is HA

5) you ACK to go on with the upgrade of the FI master (you loose the UCSM session a second time)

- ....should be done to confirm HA as a preliminary step to upgrading the master..... Agreed see 4) above ! and has nothing to do with ....force the subordinate to lead so that the primary FI.....

- ....but the inability to query the cluster state from the master FI.... ?? you can have a ssh session to either FI, indep. of master or subordinate, and do "show cluster status"

M. SCOTT VINTINNER · ‎02-16-2014

Here's a blog post with more on this issue:

http://keepingitclassless.net/2014/01/cisco-ucs-unable-to-communicate-with-ucsm-controller/

I ran into this problem on my UCS as well when upgrading to 2.2(1b). In my case, I tried to roll back the FI to the previous version. This was unsuccessful and led to the FI crashing to a bash prompt, so don't do that.

Once I got that fixed, the only available option was to proceed with the upgrade on the Primary. Quite nerve-racking in a production environment. Fortunately the HA worked fine and the primary upgraded successfully.

Shahid Raza · ‎05-09-2017

My All Dear friends,

Could you please help me, I wan to join Cisco Data center classes.I have got new job on Data center but i don't have good command of Nexus.

Could you please help me to find best Nexus trainer.

#Thank you so much#