Solved: Re: UCS Shortcomings?

visitor68 · ‎12-30-2010

I regularly hear a few specific arguments critiquing the UCS that I would like someone who knows the UCS ecosystem well to clarify before my organization adopts it.

1. The Cisco UCS system is a totally proprietary and closed system, meaning:

a) the Cisco UCS chassis cannot support other vendor’s blades. For example, you can’t place an HP, IBM or Dell blade in a Cisco UCS 5100 chassis.

b) The Cisco UCS can only be managed by the Cisco UCS manager – no 3rd party management tool can be leveraged.

c) Two Cisco 6100 Fabric Interconnects can indeed support 320 server blades (40 chassis, as Cisco claims), but only with an unreasonable amount of oversubscription. The more realistic number is two (2) 6100s for every four (4) 5100 UCS chassis (32 servers), which will yield a more reasonable oversubscription ratio of 4:1.

d) A maximum of 14 UCS chassis can be managed by the UCS manager, which resides in the 6100 Fabric Interconnects. This creates islands of management domains -- 14 chasses per island, which presents an interesting challenge if you indeed try to manage 40 UCS chassis (320 servers) with the same pair of Fabric Interconnects.

e) The UCS blade servers can only use Cisco NIC cards (Palo).

f) Cisco Palo cards use a proprietary version of interface virtualization and cannot support the open SR-IOV standard.

g) The Cisco 5100 chassis can only be uplinked to the Fabric Interconnects, so any shop that already has ToR switches will have to replace them.

I would really appreciate it if anyone can give me bulleted responses to these issues. I already posted this question on Brad Hedlund's web blog -- he really knows his stuff. But I know there are a lot of knowledgeable professionals on here, too.

Thanks!

Robert Burns · ‎12-30-2010

Joe,

UCS Manager (UCSM) runs directly off the interconnects. There's no additional management software to install anywhere. Even if you use a 3rd party management suite to manage UCS, it's actually interacting with UCSM. I've seen many customer use software suites like BMC Blade Logic for complete UCS automation to great effect. As UCS mature you'll find more & more of these enterprise management suites including support to manage UCS.

The OS ratios are correct. As for the max # of server a UCS system can manage, well that's strictly dependent on the Fabric Interconnect port count. The 2nd generation FIs (coming out next year) will have upwards of 96+ ports. This will increase the "320 managed server" count even higher.

With UCS you can have some chassis with 2 uplinks, and others with 4 uplinks as an example for bandwidth intensive applications. Everything in UCS is policy based and stateless. You could create a server pool for high-bandwidth servers that would be a collection of blades from chassis' with 4 uplinks, and then regular server pools that would be a collection of servers from 2-uplink chassis. After that's setup, you simplly point your service profiles to the appropriate pool and you'll know your high end servers will have the better OS rate.

As for the SR-IOV support, I'm not 100% sure about it being supported in the other CNAs. I don't believe the current generation CNAs from Qlogic or Emulex support it, but that could possibly change with a simple firmware upgrade.

Regarding ToR design you're correct you only need a single pair of 6100's in one rack to connect ALL your chassis - they are an essential component to the system. The default operation of UCS is End-Host mode. This mode eliminates the need to run spanning-tree and allows all uplinks to be active/active. Having a unified architecture is the design advantage of UCS over competators that require multiple Management, LAN, SAN & KVM connections from each Chassis into intermediate switches. This saves on cabling, cooling, power etc. The only external connections the UCS needs is to your LAN and SAN core switches. Wire-once architecture. Adding additional compute power is as simple as connecting additional chassis uplinks to the existing interconnects. This removes a great deal of design complexity when planning your infrastructure for future expansion.

Remember to relate the two interconnects and all Chassis' as a single UCS "system".

Regards,

Robert

View solution in original post

Robert Burns · ‎12-30-2010

@ex-engineer

Sound like the people making these comments are new to Unified computing solution. Hopefully we can clarify any mis-understandings

Answers inline.

1. The Cisco UCS system is a totally proprietary and closed system, meaning:

a) the Cisco UCS chassis cannot support other vendor’s blades. For example, you can’t place an HP, IBM or Dell blade in a Cisco UCS 5100 chassis.

[Robert] - True. This is standard in the industry. You can't put IBM blades in an HP c7000 Chassis or vice-versa can you?

b) The Cisco UCS can only be managed by the Cisco UCS manager – no 3rd party management tool can be leveraged.

[Robert] - False. UCS has a completely open API. You can use any XML, SMASH/CLP, IPMI, WS-MAN. There are already applications that have built-in support for UCS from vendors such as HP (OpenManage), IBM (Tivoli), BMC Bladelogic, Altiris, Netcool etc. There's even a Microsoft SCOM plugin being deveoped. See here for more information: http://www.cisco.com/en/US/prod/ps10265/ps10281/ucs_manager_ecosystem.html

c) Two Cisco 6100 Fabric Interconnects can indeed support 320 server blades (40 chassis, as Cisco claims), but only with an unreasonable amount of oversubscription. The more realistic number is two (2) 6100s for every four (4) 5100 UCS chassis (32 servers), which will yield a more reasonable oversubscription ratio of 4:1.

[Robert] Your oversubscription rate can vary from 2:1 all the way to 8:1 depending how many uplinks in use with the current IOM hardware. With a 6120XP you can support "up to" 20 Chassis's (using single 10G uplinks between Chassis and FI) assuming you're using the expansion slot for your Ethernet & FC uplink connectivity. You can support "up to" 40 Chassis using the 6140XP in the same regard. Depending on your bandwidth requirements you will might choose to scale this to at least 2 uplinks per IOM/Chassis (20GB redundant uplink). This would give you 2 uplinks from each Chassis to each Interconnect supporting a total of 80 servers with an oversubscription rate of 4:1. Choosing the level of oversubscription requires an understanding of the underlying technology - 10G FCoE. FCoE is a lossless technology which provides greater efficiencies in data transmission than standard ethernet. No retransmissions & no dropped frames = higher performance & efficiency. Due to this efffciency you can allow for higher oversubscription rates due to less chances of contention. Of course each environement is unique. If you have some "really" high bandwidth requirements you can increase the uplinks between the FI & Chassis. For most customers we've found that 2 uplinks never becomes close to saturated. You best bet is to analyse/monitor the actual traffic and decide what needs you require.

d) A maximum of 14 UCS chassis can be managed by the UCS manager, which resides in the 6100 Fabric Interconnects. This creates islands of management domains -- 14 chasses per island, which presents an interesting challenge if you indeed try to manage 40 UCS chassis (320 servers) with the same pair of Fabric Interconnects.

[Robert] False. Since the release of UCS we have limited the # of Chassis supported. This is to ensure a controllable deployment in customer environements. With each version of software released we're increasing that #. The # of chassis is limited theoretically only by the # of ports on the fabric interconnects (taking into account your uplink configuration). With the latest version 1.4, the supported chassis count has been increased to 20. Most customer are test driving UCS and are not near this limiation. For customers requiring more than this about (or the full 40 Chassis limit) they can discuss this with their Cisco Account manager for special consideration.

It's always funny how competators comment on "UCS Management Islands". If you look at the competition and take into consideration Chassis, KVM, Console/iLO/RAC/DRAC, Ethernet Switch and Fiber Switch management elements UCS has a fraction the amount of management points when scaling to beyond hundreds of servers.

e) The UCS blade servers can only use Cisco NIC cards (Palo).

[Robert] False. Any UCS blade server can use either a Emulex CNA, Qlogic CNA, Intel 10G NIC, Broadcom 10G NIC or ... our own Virtual Interface Card - aka Palo. UCS offers a range of options to suite various customer preferences.

f) Cisco Palo cards use a proprietary version of interface virtualization and cannot support the open SR-IOV standard.

[Robert] Palo is SR-IOV capable. Palo was designed originally to not be SR-IOV dependent by design. This removes dependencies on the OS vendors to provide driver support. As we have control over this, Cisco can provide the drivers for various OS's without relying on vendors to release patch/driver updates. Microsoft, Redhat and VMware have all been certified to work with Palo.

g) The Cisco 5100 chassis can only be uplinked to the Fabric Interconnects, so any shop that already has ToR switches will have to replace them

[Robert] Not necessaryily true. The Interconnects are just that - they interconnect the Chassis to your Ethernet & FC networks. The FI's act as your access switches for the blades which should connect into your distribution/Core solely due to the 10G interfaces requirements. I've seen people uplink UCS into a pair of Nexus 5000's which in turn connect to their Data core 6500s/Nexus 7000s. This doesn't mean you can't reprovision or make use of ToR switches, you're just freeing up a heap of ports that would be required to connect a legacy non-unified I/O chassis.

Let us know you have any further questions, and we'd be more than pleased to clear the air of myths.

Regards,

Robert

visitor68 · ‎12-30-2010

Robert, thank you very much for those most informative answers. I really appreciate it.

I have responded to your points in blue. Can you look them over right quick and give me your thoughts?

Thanks, again!

a) the Cisco UCS chassis cannot support other vendor’s blades. For example, you can’t place an HP, IBM or Dell blade in a Cisco UCS 5100 chassis.

[Robert] - True. This is standard in the industry. You can't put IBM blades in an HP c7000 Chassis or vice-versa can you?

I believe the Dell blade chassis can support blades from HP and IBM. I would have to double-check that.

b) The Cisco UCS can only be managed by the Cisco UCS manager – no 3rd party management tool can be leveraged.

[Robert] - False. UCS has a completely open API. You can use any XML, SMASH/CLP, IPMI, WS-MAN. There are already applications that have built-in support for UCS from vendors such as HP (OpenManage), IBM (Tivoli), BMC Bladelogic, Altiris, Netcool etc. There's even a Microsoft SCOM plugin being deveoped. See here for more information: http://www.cisco.com/en/US/prod/ps10265/ps10281/ucs_manager_ecosystem.html

This is very interesting. I had no idea that the Cisco UCS ecosystem can be managed by other vendor management solutions. Can a 3rd party platform be used in lieu of UCS manager (as opposed to just using 3rd party plug-ins) ? Just curious...

c) Two Cisco 6100 Fabric Interconnects can indeed support 320 server blades (40 chassis, as Cisco claims), but only with an unreasonable amount of oversubscription. The more realistic number is two (2) 6100s for every four (4) 5100 UCS chassis (32 servers), which will yield a more reasonable oversubscription ratio of 4:1.

[Robert] Your oversubscription rate can vary from 2:1 all the way to 8:1 depending how many uplinks in use with the current IOM hardware. With a 6120XP you can support "up to" 20 Chassis's (using single 10G uplinks between Chassis and FI) assuming you're using the expansion slot for your Ethernet & FC uplink connectivity. You can support "up to" 40 Chassis using the 6140XP in the same regard. Depending on your bandwidth requirements you will might choose to scale this to at least 2 uplinks per IOM/Chassis (20GB redundant uplink). This would give you 2 uplinks from each Chassis to each Interconnect supporting a total of 80 servers with an oversubscription rate of 4:1. Choosing the level of oversubscription requires an understanding of the underlying technology - 10G FCoE. FCoE is a lossless technology which provides greater efficiencies in data transmission than standard ethernet. No retransmissions & no dropped frames = higher performance & efficiency. Due to this efffciency you can allow for higher oversubscription rates due to less chances of contention. Of course each environement is unique. If you have some "really" high bandwidth requirements you can increase the uplinks between the FI & Chassis. For most customers we've found that 2 uplinks never becomes close to saturated. You best bet is to analyse/monitor the actual traffic and decide what needs you require.

Actually, I should have checked this out for myself. What I posted was preposterous and I should have spotted that right off the bat. And you should have hung me out to dry for it! :-) Correct me if I'm wrong, but two (2) 6140 FICs can handle A LOT more than just 4 UCS blade chassis, as I stated earlier. In fact, if a 4:1 OS ratio is desired (as in my question), two 6140 FICs can handle 20 UCS chassis, not 4. Each chassis will have 4 total uplinks to the 6140s - 2 uplinks per FEX per FIC. That equates to 160 servers.

If 320 servers is desired, the OS ratio will have to go up to 8:1 - each chassis with 2 uplinks, one from each FEX to each FIC.

Is all this about OS ratios correct?

d) A maximum of 14 UCS chassis can be managed by the UCS manager, which resides in the 6100 Fabric Interconnects. This creates islands of management domains -- 14 chasses per island, which presents an interesting challenge if you indeed try to manage 40 UCS chassis (320 servers) with the same pair of Fabric Interconnects.

[Robert] False. Since the release of UCS we have limited the # of Chassis supported. This is to ensure a controllable deployment in customer environements. With each version of software released we're increasing that #. The # of chassis is limited theoretically only by the # of ports on the fabric interconnects (taking into account your uplink configuration). With the latest version 1.4, the supported chassis count has been increased to 20. Most customer are test driving UCS and are not near this limiation. For customers requiring more than this about (or the full 40 Chassis limit) they can discuss this with their Cisco Account manager for special consideration.

It's always funny how competators comment on "UCS Management Islands". If you look at the competition and take into consideration Chassis, KVM, Console/iLO/RAC/DRAC, Ethernet Switch and Fiber Switch management elements UCS has a fraction the amount of management points when scaling to beyond hundreds of servers.

I understand. Makes sense.

e) The UCS blade servers can only use Cisco NIC cards (Palo).

[Robert] False. Any UCS blade server can use either a Emulex CNA, Qlogic CNA, Intel 10G NIC, Broadcom 10G NIC or ... our own Virtual Interface Card - aka Palo. UCS offers a range of options to suite various customer preferences.

Interesting. I didnt know that.

f) Cisco Palo cards use a proprietary version of interface virtualization and cannot support the open SR-IOV standard.

[Robert] Palo is SR-IOV capable. Palo was designed originally to not be SR-IOV dependent by design. This removes dependencies on the OS vendors to provide driver support. As we have control over this, Cisco can provide the drivers for various OS's without relying on vendors to release patch/driver updates. Microsoft, Redhat and VMware have all been certified to work with Palo.

Correct SR-IOV is a function of the NIC card and its drivers, but it does need support from the hypervisor. That having been said, can a non-Cisco NIC (perhaps one of the ones you mentiojned above) that supports SR-IOV be used with a Cisco blade server in the UCS chassis?

g) The Cisco 5100 chassis can only be uplinked to the Fabric Interconnects, so any shop that already has ToR switches will have to replace them

[Robert] Not necessaryily true. The Interconnects are just that - they interconnect the Chassis to your Ethernet & FC networks. The FI's act as your access switches for the blades which should connect into your distribution/Core solely due to the 10G interfaces requirements. I've seen people uplink UCS into a pair of Nexus 5000's which in turn connect to their Data core 6500s/Nexus 7000s. This doesn't mean you can't reprovision or make use of ToR switches, you're just freeing up a heap of ports that would be required to connect a legacy non-unified I/O chassis.

I understand what you mean, but if a client has a ToR design already in use, those ToRs must be ripped out. For example, lets say they had Brocade B-8000s at the ToR, it's not as if they can keep them in place and connect the UCS 5100 chassis to them. The 5100 needs the FICs.

Regards,

Joe

Robert Burns · ‎12-30-2010

Joe,

UCS Manager (UCSM) runs directly off the interconnects. There's no additional management software to install anywhere. Even if you use a 3rd party management suite to manage UCS, it's actually interacting with UCSM. I've seen many customer use software suites like BMC Blade Logic for complete UCS automation to great effect. As UCS mature you'll find more & more of these enterprise management suites including support to manage UCS.

The OS ratios are correct. As for the max # of server a UCS system can manage, well that's strictly dependent on the Fabric Interconnect port count. The 2nd generation FIs (coming out next year) will have upwards of 96+ ports. This will increase the "320 managed server" count even higher.

With UCS you can have some chassis with 2 uplinks, and others with 4 uplinks as an example for bandwidth intensive applications. Everything in UCS is policy based and stateless. You could create a server pool for high-bandwidth servers that would be a collection of blades from chassis' with 4 uplinks, and then regular server pools that would be a collection of servers from 2-uplink chassis. After that's setup, you simplly point your service profiles to the appropriate pool and you'll know your high end servers will have the better OS rate.

As for the SR-IOV support, I'm not 100% sure about it being supported in the other CNAs. I don't believe the current generation CNAs from Qlogic or Emulex support it, but that could possibly change with a simple firmware upgrade.

Regarding ToR design you're correct you only need a single pair of 6100's in one rack to connect ALL your chassis - they are an essential component to the system. The default operation of UCS is End-Host mode. This mode eliminates the need to run spanning-tree and allows all uplinks to be active/active. Having a unified architecture is the design advantage of UCS over competators that require multiple Management, LAN, SAN & KVM connections from each Chassis into intermediate switches. This saves on cabling, cooling, power etc. The only external connections the UCS needs is to your LAN and SAN core switches. Wire-once architecture. Adding additional compute power is as simple as connecting additional chassis uplinks to the existing interconnects. This removes a great deal of design complexity when planning your infrastructure for future expansion.

Remember to relate the two interconnects and all Chassis' as a single UCS "system".

Regards,

Robert

visitor68 · ‎12-31-2010

Robert, excellent stuff. Rated again, my friend.

As for SR-IOV, there are indeed several CNAs that support it. I believe Emulex and Qlogic, as well as the Intel 82576.

This leads to another question...Cisco offers VN-Link technology through software (1000v) or through hardware using VN-Tag. And VN-Link is supported by both Nexus ToRs and the UCS FIC. Before I ask my question, is all this correct?

So, what if a shop decides that they do not want to take the Cisco proprietary VN-Link approach but instead use the open standards SR-IOV-VEPA approach? For example, a client would like to use a QLogic SR-IOV-enabled CNA and then leverage VEPA (when its ratified) to map an SR-IOV VF to a vEth port on the adjacent bridge (or FIC, as in the case of UCS) -- will Cisco's UCS allow this design approach? Moreover, will the FIC and the Nexus switches support VEPA?

Thank you so much for all this help. Trust me, there are a lot of people out there who have these questions. Perhaps you can use my questions to create a Q&A thread on here...?

Robert Burns · ‎01-01-2011

Great questions.

I'll address the points I can, but I'll rope in some experts on VN-Link into this thread to address the rest.

As far as I know Qlogic's 3rd Gen CNA's will start supporting SR-IOV (8200 series). The current 8000/8100 series CNAs available for UCS (M71KR and M72KR) will not support SR-IOV. As for Emulex I don't believe their existing OneConnect support SR-IOV either. I know some of their 10GE adapters do. Either way I'm sure they're not far off as the standards are nearly ratified.

VN-Link is SW is provided by the Nexus 1000v

VN-Link in HW is provided by the UCS Pass Through Switch (PTS)

One benefit of VN-Link in SW (with use of a Hypervisor) is there is no tag added to frames when packets traverse outside of the vSphere hosts. A minor disadvantage to this approach is the CPU/Memory overhead required - which really isn't that negligable in today's servers. The proposed standards with VEPA will require a "tag" to be added to packets, so upstream switches can identify and process them accordingly. Switches will need to include support for new functionality including reflective relay etc)

As the standards for Bridge Port Extension and Edge Virtual Switching finalize, I'm confident Cisco will include support in our DC switches. The current implementation of VN-Link is very similar to the proposed drafts of these standards. Just like most technologies Cisco pioneers (DCE, PAgP, HRSP, CDP, PoE), we always build in support for the industry standard versions as well (DCB, LACP, VRRP, LLDP, 802.3af), when they're ratified. As to the extent of support within UCS, I'll have to investigate that further before responding. Some of this information might be NDA, but anything I find I can share I will update this post.

In closing, Cisco has (and probably always will) take slack from competators on "proprietary" technology. The fact of the matter is, Cisco is usually on the fore front of innovation and usually is one of the major contributors to IEEE standards. We take pride on pushing technology out so customers can take advantage of the benefits, and we'll usually update our technologies to become standards compliant. Take DCE/CEE/FCoE for example. Cisco Nexus 5000 switches were the first enterprise switch to offer FCoE. As the FCoE standards ratified that year, we updated our switch software to fully support the T11 standard Cisco greatly contributed to formalizing.

Here's a link to the two standards involved here:

802.1Qbg -Edge Virtual Bridging

802.1Qbh -Bridge Port Extension

Info on VN-Link

Regards,

Robert