cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
29017
Views
6
Helpful
8
Comments
Sandeep Singh
Level 7
Level 7

Introduction

The UCS system has lot of different components which work together. All these components’ working together seamlessly is what makes UCS look inherent. However a failure can happen at any point and this needs to be dealt with. This document tries to cover a broad set of failure situations; although not all situations are covered.

Prerequisites

This document assumes that reader has basic knowledge about UCS components (e.g Fabric Interconnect, Fabric Extender, Chassis) and techniques like NIC teaming and SAN multi-pathing.

Understanding Fabric Failure

In a simple scenario of UCS system with a server with CNA card, following may happen:

a) FI failure : results in fabric failure for all connected UCS chassis

b) FEX failure : results in fabric failure for one UCS chassis

c) FI-FEX link failure : results in fabric failure for some of the servers within a UCS chassis (depending on number of servers and uplinks)

d) One CNA port failure : results in fabric failure for one server

In any of the above cases downtime can be eliminated by using redundant hardware and proper config.

Understanding  Failover

When redundant hardware and proper configuration is in place, any failure will result in failover. The behaviour described below is for end-host mode only, since in switched mode the link status is not propagated.

a)  One uplink of one FI fail : In this case UCS will re-pin the traffic to the remaining uplink to the FI.

b)  Both uplinks of one FI fail or FI fails : In this case the corresponding server links will be shut since there is no uplink available on an FI. The FI will propagate link-down status to the adapter. Once adapter link-down status occurs, it is the responsibility of the operating system to re-pin traffic to the remaining NIC/HBA. The exception here is with Palo adapter (M71KR and M81KR) which supports fabric failover.

c) One uplink of one FEX fails : In this case the server blades pinned to the failed uplink will have the links shut. Although this applies only to UCS not having the new hardware FEX & FI, running  1.x or 2.x.

d) Both uplinks of one FEX fail or FEX fails : In this case all adapters on that fabric will lose network/storage connectivity. If host level redundancy is configured (NIC teaming and SAN multi-pathing) the traffic will be re-routed trough the other FEX.

e) One adapter fails : If this is the only adapter then connectivity will be lost. If a redundant adapter is available and host level redundancy is configured, the traffic will be re-routed through the other adapter. Some UCS adapters like M71KR and M81KR support fabric failover at adapter level, thus eliminating the need of host level redundancy configuration (NIC teaming). As in case of NIC teaming, this will detect any failure between the adapter and the FI uplink. However, SAN fabric design considerations must be considered for vHBA failover. In most situations it is discouraged to have vHBA fabric failover.

Related Information

How does UCS manager High Availability architecture works

UCS with a single fabric interconnect vs. dual fabric interconnect topology

Comments
compton18
Level 1
Level 1

How about in the scenario when you loose connectivity between FI - A and the WAN (Upstream switch fails). FI - B still has connectivity to the WAN and the CNA is setup for Fabric A with failover enabled for Fabric B.

Will UCS see the upstream outage and failover to fabric B in this scenario? Or does failover only work within UCS?

Thanks!

Keny Perez
Level 8
Level 8

The server vNICs are pinned to uplink port that have to be up/up; if an specific uplink port fails, the vNIC detects the failure, unpins itself from that port and tries another uplink and eventually fails over to the other FI when all uplinks  go down(in case of dynamic pinning) or fails over to the other FI immediately (in case of LAN Pin Groups).

Bottom line, the vNIC can also fail over if the uplink it is pinned to goes down.

 

-Kenny

 

compton18
Level 1
Level 1

Keny,

Thanks for your response!

 

Keny Perez
Level 8
Level 8

Anytime :)

bo liu
Level 4
Level 4

HI

thanks for your doc.

i have tow FI 6248 and one 5108

i test the failover

first i shutdown a service profile 

then i reboot FI-1

in this time i boot the server ,but i can't boot it with config fail error 

gustavomardones
Community Member

Hi,

How can I check if my 5108 chassis is using all 4 available links with the FI without re-acknowledge?

I guess that is using only 2-link because is configured 2-link chassis discovery policy, but the UCS system is on production environment and can't perform a re-acknowledge yet.

Sandeep Singh
Level 7
Level 7

Hi gustavo

Chassis discovery policy only describes that how many links are necessary for the chassis to be discovered, and initially these links come up. After this if you do a reack of chassis all the links will come up. To check how many ports are being used, in the UCSM GUI Equipments tab expand Chassis > IO Modules > Fabric ports. 

eduardorives1
Level 1
Level 1

I want to add that if your FIs are configured with Ethernet Mode: Switch and both uplinks on one FI fail your traffic will be black-holed as pinning is applicable only in End-Host mode . I experienced this in my Production "lab" :) last week and it was later confirmed by TAC.

 

I think this should be clarified in more detail in this post and any other official documentation : "pinning is applicable only in End-Host mode"... in my opinion, isnt scary enough and may not be true to the actual consequences of this expected behavior.

 

Thanks!

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: