Did you check the cluster

Amit Vyas · ‎06-14-2014

Hi,

We are doing UAT testing for our newly installed 6248 FI's with following details and we are observing some unusual behavior

FI-A - 172.10.15.51 (Primary)

FI-B - 172.10.15.52 (Secondary)

Cluster IP - 172.10.15.50

If we are shutting down the FI-A (Primary), the Cluster IP is not getting fail over to FI-B and FI-B is not be promoted as Primary

Is there any issue related to 2.2(1b)?

Walter Dey · ‎06-14-2014

Did you check the cluster status, before the test: CLI show cluster status ?

Can you clarify what you mean with shutting down FI-A ? power off ?

ssh to FI-B, after shutting FI-A, and do CLI show cluster status ?

Amit Vyas · ‎06-14-2014

Below is the output before power off FI-A

================================================

lavender-A(local-mgmt)# show cluster extended-state Cluster Id: 0x7bce8766ecdc11e3-0xa1b4002a6ac23b41

Start time: Sat Jun 14 21:05:55 2014 Last election time: Sat Jun 14 21:17:57 2014

A: UP, PRIMARY B: UP, SUBORDINATE

A: memb state UP, lead state PRIMARY, mgmt services state: UP B: memb state UP, lead state SUBORDINATE, mgmt services state: UP heartbeat state PRIMARY_OK

INTERNAL NETWORK INTERFACES: eth1, UP eth2, UP

HA READY Detailed state of the device selected for HA storage: Chassis 1, serial: FOX1812GNZZ, state: active Chassis 2, serial: FOX1813GFT7, state: active Chassis 3, serial: FOX1814G5DR, state: active lavender-A(local-mgmt)#

================================================

Below is the output after shutting down the FI-A

================================================

lavender-B(local-mgmt)# show cluster extended-state Cluster Id: 0x7bce8766ecdc11e3-0xa1b4002a6ac23b41

Start time: Wed Jun 11 19:09:13 2014 Last election time: Sat Jun 14 13:45:18 2014

B: UP, PRIMARY, (Management services: SWITCHOVER IN PROGRESS) A: DOWN, INAPPLICABLE

B: memb state UP, lead state PRIMARY, mgmt services state: INVALID A: memb state DOWN, lead state INAPPLICABLE, mgmt services state: DOWN heartbeat state SECONDARY_FAILED

INTERNAL NETWORK INTERFACES: eth1, DOWN eth2, DOWN

HA NOT READY Management services: switchover in progress on local Fabric Interconnect Detailed state of the device selected for HA storage: Chassis 1, serial: FOX1812GNZZ, state: active Chassis 2, serial: FOX1813GFT7, state: active Chassis 3, serial: FOX1814G5DR, state: active lavender-B(local-mgmt)#

================================================

Its really strange that when I do the manual cluster lead change, its taking good amount of time to switch the VIP from FI-A to FI-B wise a versa

Regards,

Amit Vyas

Walter Dey · ‎06-14-2014

I assume, initially you can of course ping FI-A and B, and VIP ?

Now as can be seen, FI-A is down; and FI-B is primary ? can you now ping FI-B and VIP ?

Amit Vyas · ‎06-16-2014

I can only ping FI-B and unable to ping VIP until FI-A come up

Walter Dey · ‎06-16-2014

Must be a bug !

btw. did you set this ??

admin->managementInterfaces->managementInterfaceMonitoringPolicy

Admin status -> enable
MII status

Walter Dey · ‎06-16-2014

CSCum82888

After an upgrade to UCSM 2.2.1b we see the following symptoms:

- No UCSM GUI access.

- Virtual IP is not reachable.

- Virtual IP cannot be accessed either by GUI or CLI/SSH.

- Individual FI can be accessed via SSH but not via http.

<B>Conditions:</B>

- Issue occurred after an upgrade to version 2.2.1b.

- The issue can happen in the following two conditions:

1. default keyring is deleted and the system is upgraded to 2.2.1<x>

2. When default keyring is deleted on a system running 2.2.1<x> and the system is rebooted.

Workaround:

Workaround:

-       Make the key and certificate links used by apache httpd to point to any valid key/certificate (by deleting and re-creating the links). This requires loading debug plugin.

The situation can be avoided by:

1. not deleting the default keyring (or recreating it if deleted) before upgrading to 2.2.1<x>.

2. Not deleting the default keyring even after upgrading to 2.2.1<x>

<B>Further Problem Description:</B>

Issue is caused because of deadlock.

Amit Vyas · ‎06-17-2014

I am also assume the same that it might be a bug

Yes, Admin status is enabled and Monitoring Mechanism is MII Status

But above setting also didn't help, when I restarted the primary FI, I lost the ping VIP and FI-A and VIP response came after FI-A came up

Walter Dey · ‎06-17-2014

Hi Amit

I hope you have seen the bug id below !

Walter.

Amit Vyas · ‎06-19-2014

I am not hitting this bug, because these devices are shipped with 2.2.(1b)

Not sure what will be the issue ?

Walter Dey · ‎06-19-2014

Understood, but it could also happen, if

2. When default keyring is deleted on a system running 2.2.1<x> and the system is rebooted.

Amit Vyas · ‎06-19-2014

No, haven't delete key ring but have rebooted FI's multiple times

Walter Dey · ‎06-19-2014

I would open a TAC case, and/or upgrade the infrastructure with autoinstall to the latest, resp. long lived Cisco recommended release (2.2.1d)

Amit Vyas · ‎06-23-2014

I have raised the SR and TAC engineer is working on it

Mean while I wanted to know / understand following two things

How Fail over happens in Fabric Interconnect? i.e. how Primary Subordinate role gets transfer from one to another
What is the ideal time to create sub-interface on peer fabric interconnect for VIP when Primary FI reboot or powered down

UCSM 2.2.(1b) Cluster IP Failover Issue