09-17-2012 02:57 AM - edited 03-01-2019 05:57 AM
This document shows various UCS failure scenarios and method to test them. These tests are good to check for correct UCS system behavior in case of a failure. A UCS system consists of one or two UCS 6100 series switches or fabric interconnects. A UCS system with two fabric interconnects is typically deployed in active-active pair that supports failover. UCS 2100 fabric extender enables the UCS 6100 fabric interconnect to provide all access-layer switching needs for the connected servers. This traffic is then switched to its required destination by the fabric interconnect and no switching, whatsoever, is done by the fabric extender. This document shows how to test the following failure scenarios using the UCS Manager CLI:
The CLI is organized into a hierarchy of command modes, with the EXEC mode being the highest-level mode of the hierarchy. Higher-level modes branch into lower-level modes. You use create, enter, and scope commands to move from higher-level modes to modes in the next lower level , and you use the exit command to move up one level in the mode hierarchy. Each mode contains a set of commands that can be entered in that mode. Most of the commands available in each mode pertain to the associated managed object. Depending on your assigned role and locale, you may have access to only a subset of the commands available in a mode; commands to which you do not have access are hidden.
The CLI prompt for each mode shows the full path down the mode hierarchy to the current mode. This helps you to determine where you are in the command mode hierarchy, and it can be an invaluable tool when you need to navigate through the hierarchy.
The method here is to reboot one FI switch, preferably the subordinate fabric interconnect switch. The expected result is to check the "show cluster state" command will show the primary fabric interconnect "UP" and the subordinate fabric interconnect "DOWN". The fabric interconnect cluster will show "NOT READY".
Sample Test
On Fabric Interconnect A (Primary):
UCS1-FA-A# show cluster state
Cluster Id: 0x603bca7e0b7311e1-0x9987547fee1fd575
A: UP, PRIMARY
B: UP, SUBORDINATE
HA READY
On Fabric Interconnect B (Subordinate):
UCS1-FA-B(local-mgmt)# reboot
The switch will be rebooted. Are you sure? (yes/no):yes
Read from remote host 10.134.166.89: Connection reset by peer
Connection to 10.134.166.89 closed.
Verify on the Primary Fabric Interconnect:
UCS1-FA-A# show cluster state
Cluster Id: 0x603bca7e0b7311e1-0x9987547fee1fd575
A: UP, PRIMARY
B: DOWN, INAPPLICABLE
HA NOT READY
Peer Fabric Interconnect is down
This method requires reboot of a UCS blade server. The expected result is that the Overall Status coloumn in the "show server status" command will show "Power Off" while the blade is rebooting.
Sample Test
[UCS1-80 ~]$ reboot
Verify on the Primary Fabric Interconnect:
UCS1-FA-A# show server status
Server Slot Status Availability Overall Status Discovery
------- --------------------------------- ------------ --------------------- ---------
<--output omitted-->
10/1 Equipped Unavailable Ok Complete
10/2 Equipped Unavailable Ok Complete
10/3 Equipped Unavailable Ok Complete
10/4 Equipped Unavailable Ok Complete
10/5 Equipped Unavailable Ok Complete
10/6 Equipped Unavailable Ok Complete
10/7 Equipped Unavailable Ok Complete
10/8 Equipped Unavailable Power Off Complete
This method requires to disable a server port (interface); e.g. server port 1/9 in fabric interconnect B. The expected result is that the server's "show interface" command in the fabric will show the port's Admin State "Disabled" and Oper State "Failed".
Sample Test
UCS1-FA-A# scope eth-server
UCS1-FA-A /eth-server # scope fabric b
UCS1-FA-A /eth-server/fabric # show interface
Interface:
Slot Id Port Id Admin State Oper State Lic State Grace Prd State Reason
---------- ---------- ----------- ---------------- -------------------- --------------- ------------
1 1 Enabled Up License Ok 0
1 10 Enabled Up License Ok 0
1 11 Enabled Up License Ok 0
<--output omitted-->
1 6 Enabled Up License Ok 0
1 7 Enabled Up License Ok 0
1 8 Enabled Up License Ok 0
1 9 Enabled Up License Ok 0
UCS1-FA-A /eth-server/fabric # enter interface 1 9
UCS1-FA-A /eth-server/fabric/interface # disable
UCS1-FA-A /eth-server/fabric/interface # commit-buffer
UCS1-FA-A /eth-server/fabric/interface # exit
To Verify:
UCS1-FA-A /eth-server/fabric # show interface
Interface:
Slot Id Port Id Admin State Oper State Lic State Grace Prd State Reason
---------- ---------- ----------- ---------------- -------------------- --------------- ------------
1 1 Enabled Up License Ok 0
1 10 Enabled Up License Ok 0
1 11 Enabled Up License Ok 0
<--output omitted-->
1 6 Enabled Up License Ok 0
1 7 Enabled Up License Ok 0
1 8 Enabled Up License Ok 0
1 9 Disabled Failed Unknown 0 Admin config change
UCS1-FA-A / eth-server # exit
This method requires to disable services on an uplink port, e.g. Uplink Port 1/39 on Fabric B. The expected result is that the fabric's uplink "show interface" command will show the port's Admin State "Disabled" and Oper State "Admin Down".
Sample Test
UCS1-FA-A# scope eth-uplink
UCS1-FA-A /eth-uplink # scope fabric b
UCS1-FA-A /eth-uplink/fabric # show interface
Interface:
Slot Id Port Id Admin State Oper State Lic State Grace Period State Reason
---------- ---------- ----------- ---------------- -------------------- --------------- ------------
1 36 Disabled Admin Down Unknown 0 Administratively down
1 40 Disabled Admin Down Unknown 0 Administratively down
Member Port:
Port-channel Slot Port Oper State State Reason Lic State Grace Period
------------ ----- ----- --------------- ----------------------------------- -------------------- ------------
1 1 35 Up License Ok 0
1 1 39 Up License Ok 0
2 1 33 Sfp Not Present Unknown Unknown 0
2 1 37 Sfp Not Present Unknown Unknown 0
UCS1-FA-A /eth-uplink/fabric # enter interface 1 39
UCS1-FA-A /eth-uplink/fabric/interface* # disable
UCS1-FA-A /eth-uplink/fabric/interface* # commit-buffer
UCS1-FA-A /eth-uplink/fabric/interface # exit
To verify:
UCS1-FA-A /eth-uplink/fabric # show interface
Interface:
Slot Id Port Id Admin State Oper State Lic State Grace Period State Reason
---------- ---------- ----------- ---------------- -------------------- --------------- ------------
1 36 Disabled Admin Down Unknown 0 Administratively down
1 39 Disabled Admin Down Unknown 0 Administratively down
1 40 Disabled Admin Down Unknown 0 Administratively down
Member Port:
Port-channel Slot Port Oper State State Reason Lic State Grace Period
------------ ----- ----- --------------- ----------------------------------- -------------------- ------------
1 1 35 Up License Ok 0
2 1 33 Sfp Not Present Unknown Unknown 0
2 1 37 Sfp Not Present Unknown Unknown 0
UCS1-FA-A /eth-uplink/fabric # exit
Understanding Fabric Failure and Failover in UCS
How to recover from a software failure on the 6120 Fabric Interconnect
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: