The UCS system has lot of different components which work together. All these components’ working together seamlessly is what makes UCS look inherent. However a failure can happen at any point and this needs to be dealt with. This document tries to cover a broad set of failure situations; although not all situations are covered.
This document assumes that reader has basic knowledge about UCS components (e.g Fabric Interconnect, Fabric Extender, Chassis) and techniques like NIC teaming and SAN multi-pathing.
Understanding Fabric Failure
In a simple scenario of UCS system with a server with CNA card, following may happen:
a)FI failure : results in fabric failure for all connected UCS chassis
b)FEX failure : results in fabric failure for one UCS chassis
c)FI-FEX link failure : results in fabric failure for some of the servers within a UCS chassis (depending on number of servers and uplinks)
d)One CNA port failure : results in fabric failure for one server
In any of the above cases downtime can be eliminated by using redundant hardware and proper config.
When redundant hardware and proper configuration is in place, any failure will result in failover. The behaviour described below is for end-host mode only, since in switched mode the link status is not propagated.
a)One uplink of one FI fail : In this case UCS will re-pin the traffic to the remaining uplink to the FI.
b)Both uplinks of one FI fail or FI fails : In this case the corresponding server links will be shut since there is no uplink available on an FI. The FI will propagate link-down status to the adapter. Once adapter link-down status occurs, it is the responsibility of the operating system to re-pin traffic to the remaining NIC/HBA. The exception here is with Palo adapter (M71KR and M81KR) which supports fabric failover.
c)One uplink of one FEX fails : In this case the server blades pinned to the failed uplink will have the links shut. Although this applies only to UCS not having the new hardware FEX & FI, running 1.x or 2.x.
d)Both uplinks of one FEX fail or FEX fails : In this case all adapters on that fabric will lose network/storage connectivity. If host level redundancy is configured (NIC teaming and SAN multi-pathing) the traffic will be re-routed trough the other FEX.
e)One adapter fails : If this is the only adapter then connectivity will be lost. If a redundant adapter is available and host level redundancy is configured, the traffic will be re-routed through the other adapter. Some UCS adapters like M71KR and M81KR support fabric failover at adapter level, thus eliminating the need of host level redundancy configuration (NIC teaming). As in case of NIC teaming, this will detect any failure between the adapter and the FI uplink. However, SAN fabric design considerations must be considered for vHBA failover. In most situations it is discouraged to have vHBA fabric failover.
Hi, we are experiencing issues in our iSCSI environment and i got asked if we can monitor the iSCSI IO/s on the Nexus 5k interfaces.I looked at the interface counters but could not find a fitting counter. Have i overlooked something or is there ...
Nexus 93180 some interfaces has "output error" on "Store and Forward" Mode.I remember “Cut-Through Mode” will be "output error" if interface has "input error".but My Nexus 93180 is "Store and Forward" Mode.inter...
Hi all, looking for advice on an issue that we facing for traffic visibility by connecting spine and leaf via 40G BiDi taps.This is the brief description of the issue.We are connecting ACI spine and leaf via NetScout tap as in attached picture.Spine and l...
New Cisco Champion Radio release on Cisco Intersight Cloud Operations PlatformListen: https://smarturl.it/CCRS8E15Follow us: https://twitter.com/CiscoChampion Known as Project Starship when it was introduced in June 2017, Cisco Intersight has come a ...
Hello, Switch100 is two 9300 stack, and SW10 and SW20 single switch 9300. Three switches connect each other with two cables. show cdp neighbor shows the connection like below, which looks like unreasonable. Did I miss some something in these checks? Anyon...