cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2453
Views
0
Helpful
13
Replies

UCSM 2.2.(1b) Cluster IP Failover Issue

Amit Vyas
Level 1
Level 1

Hi,

 

We are doing UAT testing for our newly installed 6248 FI's with following details and we are observing some unusual behavior

 

FI-A - 172.10.15.51 (Primary)

FI-B - 172.10.15.52 (Secondary)

Cluster IP - 172.10.15.50

 

If we are shutting down the FI-A (Primary), the Cluster IP is not getting fail over to FI-B and FI-B is not be promoted as Primary

Is there any issue related to 2.2(1b)?

13 Replies 13

Walter Dey
VIP Alumni
VIP Alumni

Did you check the cluster status, before the test: CLI show cluster status ?

Can you clarify what you mean with shutting down FI-A ? power off ?

ssh to FI-B, after shutting FI-A, and do CLI show cluster status ?

Below is the output before power off FI-A

================================================

lavender-A(local-mgmt)# show cluster extended-state
Cluster Id: 0x7bce8766ecdc11e3-0xa1b4002a6ac23b41

Start time: Sat Jun 14 21:05:55 2014
Last election time: Sat Jun 14 21:17:57 2014

A: UP, PRIMARY
B: UP, SUBORDINATE

A: memb state UP, lead state PRIMARY, mgmt services state: UP
B: memb state UP, lead state SUBORDINATE, mgmt services state: UP
   heartbeat state PRIMARY_OK

INTERNAL NETWORK INTERFACES:
eth1, UP
eth2, UP

HA READY
Detailed state of the device selected for HA storage:
Chassis 1, serial: FOX1812GNZZ, state: active
Chassis 2, serial: FOX1813GFT7, state: active
Chassis 3, serial: FOX1814G5DR, state: active
lavender-A(local-mgmt)#

================================================

 

Below is the output after shutting down the FI-A

================================================

lavender-B(local-mgmt)# show cluster extended-state
Cluster Id: 0x7bce8766ecdc11e3-0xa1b4002a6ac23b41

Start time: Wed Jun 11 19:09:13 2014
Last election time: Sat Jun 14 13:45:18 2014

B: UP, PRIMARY, (Management services: SWITCHOVER IN PROGRESS)
A: DOWN, INAPPLICABLE

B: memb state UP, lead state PRIMARY, mgmt services state: INVALID
A: memb state DOWN, lead state INAPPLICABLE, mgmt services state: DOWN
   heartbeat state SECONDARY_FAILED

INTERNAL NETWORK INTERFACES:
eth1, DOWN
eth2, DOWN

HA NOT READY
Management services: switchover in progress on local Fabric Interconnect
Detailed state of the device selected for HA storage:
Chassis 1, serial: FOX1812GNZZ, state: active
Chassis 2, serial: FOX1813GFT7, state: active
Chassis 3, serial: FOX1814G5DR, state: active
lavender-B(local-mgmt)#

================================================

Its really strange that when I do the manual cluster lead change, its taking good amount of time to switch the VIP from FI-A to FI-B wise a versa

 

Regards,

Amit Vyas

I assume, initially you can of course ping FI-A and B, and VIP ?

Now as can be seen, FI-A is down; and FI-B is primary ? can you now ping FI-B and VIP ?

I can only ping FI-B and unable to ping VIP until FI-A come up

Must be a bug !

btw. did you set this ??

admin->managementInterfaces->managementInterfaceMonitoringPolicy

Admin status -> enable
MII status

CSCum82888

 

After an upgrade to UCSM 2.2.1b we see the following symptoms:
- No UCSM GUI access.
- Virtual IP is not reachable.
- Virtual IP cannot be accessed either by GUI or CLI/SSH.
- Individual FI can be accessed via SSH but not via http.

 
<B>Conditions:</B>
- Issue occurred after an upgrade to version 2.2.1b.

 
- The issue can happen in the following two conditions:
1. default keyring is deleted and the system is upgraded to 2.2.1<x> 
2. When default keyring is deleted on a system running 2.2.1<x> and the system is rebooted.

 
Workaround:
Workaround:
-       Make the key and certificate links used by apache httpd to point to any valid key/certificate (by deleting and re-creating the links). This requires loading debug plugin.

 
The situation can be avoided by:
1. not deleting the default keyring (or recreating it if deleted) before upgrading to 2.2.1<x>.  
2. Not deleting the default keyring even after upgrading to 2.2.1<x>

 
<B>Further Problem Description:</B>
Issue is caused because of deadlock.

 

I am also assume the same that it might be a bug

Yes, Admin status is enabled and Monitoring Mechanism is MII Status

But above setting also didn't help, when I restarted the primary FI, I lost the ping VIP and FI-A and VIP response came after FI-A came up

 

Hi Amit

I hope you have seen the bug id below !

Walter.
 

I am not hitting this bug, because these devices are shipped with 2.2.(1b)

Not sure what will be the issue ?

 

Understood, but it could also happen, if

2. When default keyring is deleted on a system running 2.2.1<x> and the system is rebooted.

No, haven't delete key ring but have rebooted FI's multiple times

I would open a TAC case, and/or upgrade the infrastructure with autoinstall to the latest, resp. long lived Cisco recommended release (2.2.1d)

I have raised the SR and TAC engineer is working on it

Mean while I wanted to know / understand following two things

  • How Fail over happens in Fabric Interconnect? i.e. how Primary Subordinate role gets transfer from one to another
  • What is the ideal time to create sub-interface on peer fabric interconnect for VIP when Primary FI reboot or powered down

 

Review Cisco Networking products for a $25 gift card