cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2407
Views
0
Helpful
13
Replies

UCSM 2.2.(1b) Cluster IP Failover Issue

Amit Vyas
Level 1
Level 1

Hi,

 

We are doing UAT testing for our newly installed 6248 FI's with following details and we are observing some unusual behavior

 

FI-A - 172.10.15.51 (Primary)

FI-B - 172.10.15.52 (Secondary)

Cluster IP - 172.10.15.50

 

If we are shutting down the FI-A (Primary), the Cluster IP is not getting fail over to FI-B and FI-B is not be promoted as Primary

Is there any issue related to 2.2(1b)?

13 Replies 13

Walter Dey
VIP Alumni
VIP Alumni

Did you check the cluster status, before the test: CLI show cluster status ?

Can you clarify what you mean with shutting down FI-A ? power off ?

ssh to FI-B, after shutting FI-A, and do CLI show cluster status ?

Below is the output before power off FI-A

================================================

lavender-A(local-mgmt)# show cluster extended-state
Cluster Id: 0x7bce8766ecdc11e3-0xa1b4002a6ac23b41

Start time: Sat Jun 14 21:05:55 2014
Last election time: Sat Jun 14 21:17:57 2014

A: UP, PRIMARY
B: UP, SUBORDINATE

A: memb state UP, lead state PRIMARY, mgmt services state: UP
B: memb state UP, lead state SUBORDINATE, mgmt services state: UP
   heartbeat state PRIMARY_OK

INTERNAL NETWORK INTERFACES:
eth1, UP
eth2, UP

HA READY
Detailed state of the device selected for HA storage:
Chassis 1, serial: FOX1812GNZZ, state: active
Chassis 2, serial: FOX1813GFT7, state: active
Chassis 3, serial: FOX1814G5DR, state: active
lavender-A(local-mgmt)#

================================================

 

Below is the output after shutting down the FI-A

================================================

lavender-B(local-mgmt)# show cluster extended-state
Cluster Id: 0x7bce8766ecdc11e3-0xa1b4002a6ac23b41

Start time: Wed Jun 11 19:09:13 2014
Last election time: Sat Jun 14 13:45:18 2014

B: UP, PRIMARY, (Management services: SWITCHOVER IN PROGRESS)
A: DOWN, INAPPLICABLE

B: memb state UP, lead state PRIMARY, mgmt services state: INVALID
A: memb state DOWN, lead state INAPPLICABLE, mgmt services state: DOWN
   heartbeat state SECONDARY_FAILED

INTERNAL NETWORK INTERFACES:
eth1, DOWN
eth2, DOWN

HA NOT READY
Management services: switchover in progress on local Fabric Interconnect
Detailed state of the device selected for HA storage:
Chassis 1, serial: FOX1812GNZZ, state: active
Chassis 2, serial: FOX1813GFT7, state: active
Chassis 3, serial: FOX1814G5DR, state: active
lavender-B(local-mgmt)#

================================================

Its really strange that when I do the manual cluster lead change, its taking good amount of time to switch the VIP from FI-A to FI-B wise a versa

 

Regards,

Amit Vyas

I assume, initially you can of course ping FI-A and B, and VIP ?

Now as can be seen, FI-A is down; and FI-B is primary ? can you now ping FI-B and VIP ?

I can only ping FI-B and unable to ping VIP until FI-A come up

Must be a bug !

btw. did you set this ??

admin->managementInterfaces->managementInterfaceMonitoringPolicy

Admin status -> enable
MII status

CSCum82888

 

After an upgrade to UCSM 2.2.1b we see the following symptoms:
- No UCSM GUI access.
- Virtual IP is not reachable.
- Virtual IP cannot be accessed either by GUI or CLI/SSH.
- Individual FI can be accessed via SSH but not via http.

 
<B>Conditions:</B>
- Issue occurred after an upgrade to version 2.2.1b.

 
- The issue can happen in the following two conditions:
1. default keyring is deleted and the system is upgraded to 2.2.1<x> 
2. When default keyring is deleted on a system running 2.2.1<x> and the system is rebooted.

 
Workaround:
Workaround:
-       Make the key and certificate links used by apache httpd to point to any valid key/certificate (by deleting and re-creating the links). This requires loading debug plugin.

 
The situation can be avoided by:
1. not deleting the default keyring (or recreating it if deleted) before upgrading to 2.2.1<x>.  
2. Not deleting the default keyring even after upgrading to 2.2.1<x>

 
<B>Further Problem Description:</B>
Issue is caused because of deadlock.

 

I am also assume the same that it might be a bug

Yes, Admin status is enabled and Monitoring Mechanism is MII Status

But above setting also didn't help, when I restarted the primary FI, I lost the ping VIP and FI-A and VIP response came after FI-A came up

 

Hi Amit

I hope you have seen the bug id below !

Walter.
 

I am not hitting this bug, because these devices are shipped with 2.2.(1b)

Not sure what will be the issue ?

 

Understood, but it could also happen, if

2. When default keyring is deleted on a system running 2.2.1<x> and the system is rebooted.

No, haven't delete key ring but have rebooted FI's multiple times

I would open a TAC case, and/or upgrade the infrastructure with autoinstall to the latest, resp. long lived Cisco recommended release (2.2.1d)

I have raised the SR and TAC engineer is working on it

Mean while I wanted to know / understand following two things

  • How Fail over happens in Fabric Interconnect? i.e. how Primary Subordinate role gets transfer from one to another
  • What is the ideal time to create sub-interface on peer fabric interconnect for VIP when Primary FI reboot or powered down

 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: