Mixed Server Architectures

visitor68 · ‎04-16-2011

Hi, folks:

With this question, I would like to engender a discussion that is -- hopefully -- not vendor-centric, but engineering-centric, instead.

What are the virtues of deploying a data center with a uniform server access architecture across the entire data center? I'm thinking of the mixed environment that results when deploying rack mount servers in a N5K system (by system, I mean 10G 2232 FEXs plus 5K, for example) along with deploying chassis-based servers that would typically have their I/O aggregated. In that case, the aggregated I/O of the blade chassis cannot be connected to a Host port of a 2232 FEX. So, what results is a separate architecture and topology to support blade servers.

Furthermore, in a data center in which abstraction layers have been defined across the entire data center for stateless computing, application mobility, and virtualized storage, it seems that there is a definable advantage to maintaining a server access architecture that is uniform in terms of access speed, oversubscription, and traffic patterns. It makes for easier troubleshooting and helps create more definable and predictable SLAs, which is a great concern in the age of Saas an IaaS.

Any thoughts from anyone would be greatly appreciated.

Regards

Surya ARBY · ‎04-16-2011

Hi.

The right topology depends on the type of servers you have to connect.

For rack mount servers running at 1G, we plan to use 2224 (max 16 servers per rack, dual homing for production NICs and 1 1GE NIC for backup - if LAN-based backup is required)

For blade chassis, the problem is different because the right topology depends on the type of I/O interconnect you want to use. In case of UCS, Fabric Interconnect is the access layer, and it has to be connected to the aggregation switches.

For IBM if running Catalyst blade Switches with Spanning-Tree, we connect these to the aggregation, if you want to use nexus 4000 with FCoE, connecting the 4k to the 5k is mandatory.

For HP, if running CBS, we connect these to the aggregation (because of STP), if running HP Virtual Connect (Flex10 / FlexFabric) theses devices are virtualized to appear to the network like a host (no STP, no external bridging, but able to form 203.ad etherchannel). With Virtual Connect the access is virtualized like an ESX and those modules can be connected to FEXs.

Another option : if you use CBS, you can run Flexlink instead of STP and then connect your CBS to the Nexus 2k.

There is no generic answer about "how do i connect my servers" and "how do I build the organization of my room".

For our new cloud services (very large company, thousands of servers expected in the new infrastructure) we will connect everything to new Nexus 55xx, using top of rack architecture for rack mount servers, and providing 10GE low cost access with the 55xx for blade 10GE uplinks (no passthrough modules, with exception for private networks dedicated for clusters). Using Top-of-Rack for rack mount servers, middle-of-row for 10GE access with nexus 55xx and out of band adlministration network.

The trend seems to make the blade server access agnostic to the network (UCS, Virtual Connect, or connecting a Nexus4k to a single parent switch to maintain a dual fabric topology, no STP is needed) so the blade can be connected to the access in End-of-row / Middle-of-row if needed.

visitor68 · ‎04-16-2011

Surya, thanks for your detailed answers. Much appreciated.

I agree that there is no generic answer for the right way to connect servers in a data center. There are a lot of variables to consider. I am not asking for a generic answer about that. Instead,I am putting forward some ideas/thoughts/considerations to make with regard to uniformity in the server access layer architecture and topology of the data center and would like to engender a discussion accordingly.

I should have added as a caveat that I am focusing on a unified (converged) fabric infrastructure. So, the Cisco CBS switches and any other vendor solution that does not support FCoE/DCB is out, regardless of the chassis vendor. That leaves the UCS as a unified platform (compute and switching) in an of itself, the N4K blade for the IBM chassis, or using 10G Ethernet pass-throughs.

But again, I am less interested in having a low-level product-oriented discussion and more interested in having a higher-level discussion that elaborates on the virtues or lack thereof of deploying a uniform architecture and topology in terms of creating predictable SLAs, streamlined troubleshooting and deterministic traffic patterns and failover. Design philosophy. It seems that in a data center where applications are roaming about from compute platform to compute platform - oftentimes in an automated fashion - there is a need for consistentcy in oversubscription and bandwidth and resource contention, which can only truly be provided by designing homogeneity in the infrastructure.

Thanks

Surya ARBY · ‎04-16-2011

Sorry, I misunderstood your first message.

For us keeping the infrastructure simple (to design and to operate) is mandatory because we outsource the production (and the engineering also, I'm a contractor). We outsource but we act as a provider for all our branches and subsidiairies.

We defined profiles for rows and racks (for racks : 1 type of Unix, 1 type for rack mount - pizza box - servers, 1 type for blades), only a single vendor for unix, and a single vendor for x86. We are taking a look to unified I/O - FCoE - but I doubt we will go for it quickly. Dual homing and redundancy for everything (switches / NICs / power supplies...). As we try to promote virtualization, we expect to have at least 80% of blade chassis for x86 servers. All the new applications will run on blades and this has been presented as an internal standard.

Everything needs to be standardized for us because otherwise, it becomes unmanageable according to the size of the infrastructure. About the oversubscription, I don't really take it in account as multiple short reach 10GE is not so expensive, and high density 10GE switches are coming to the market quickly, but the same oversubscription rate is defined for each perimeter (20GE for each IO module in blade chassis, 80GE outside each row to the core). For me : no QoS, but oversizing (it is some form of QoS already). Also we will go to a middle of row architecture in each row, no cross row cabling, except for uplinks from access to the distribution/core (and OOB network). Each rack and each row can be seen a single entity from the network.

We banned passthrough modules (with exceptions : clusters) because we don't want to manage a 1-to-1 cabling map for the blades inside a chassis.

For keeping the infrastructure simple to operate and troubleshoot, we defined internal standards (design rules) for everything (network, servers, storage, servers virtualization...). We also worked on standards rules for DC interconnect (rules about VLANs extension, STP isolation...)

visitor68 · ‎04-16-2011

Thanks, Surya. Good information.

With regard to 10G DCB-enabled pass-throughs, off the bat, one would think that there isn't much value in that approach, as you dont get cable and port consolidation that is typically a by-product of deploying blades. However, when looking into it further, I think 10G pass-throughs may very well be a feasible solution. In the case where 2232 FEXs are deployed at the ToR, pass-throughs are the only choice if one wants to maintain homogeneity in access design and infastructure.

Given that the FEX ports are relatively inexpensive, consuming a port per blade CNA may be a small price to pay for doubling or even quadrupling access oversubscription rates for chassis-based servers, creating uniformity across the data center and all the predictability and flexibility that brings. In highly virtualized environments (perhaps anywhere between 20 and 40 VMs per ESX server), that dedicated access bandwidth may be quite necessary.

So, imagine the case in which a VM running a mission critical application on a rack mount server that enjoys a dedicated network access port for its CNA is VM FT'd to a blade server with a 4:1 oversubscription ratio. Now add the bandwidth requirement of FCoE traffic. How will that impact the performance of the application? How will the client's experience and promised SLA be impacted? One can never really know unless extensive testing is done on each and every application, which we all know is never going to happen in any data center with hundreds of apps running. With regard to the cables necessary (twinax) they will be localized to the rack itself, so theres no effect on quantity with regard to inter-rack connectivity. No rats nest, so to speak.

Lastly, if one is to deploy HP or Dell blades, the only choice they have is pass-throughs if a ToR FCF (or in the case of 2232s at the ToR, an EoR FCF) design is to be maintained. Otherwise, Dell has a pretty cool chassis-based FCoE FCF solution that terminates FCoE traffic right at the chassis itself. IBM blades do have the N4K, but 1, they cannot be connected to the 2232 FEX, so there goes homogeneity in access design, and 2, they are actually pretty expensive. They also present relatively complex management points, as opposed to a blade I/O that does nothing more than pass FCoE traffic to the ToR.

I know I have given you a lot, so please take your time in answering. No rush here.

Thanks!

Surya ARBY · ‎04-16-2011

In my environment I don't have the requirement to put an FC switch as a top of rack, for the SAN we only work with MoR / EoR. But from a financial perspective I'm not sure it makes sense : 2 10GE passthrough + 2 Nexus 2232 and optics, we did the exercise and finally, embedded IO modules terminating FCoE at the chassis level were more cost effective (one of our standard is to not use FCoE beyond the access layer, mainly for management considerations between 3 different teams, so we have to integrate traditionnal FC switches to connect all the chassis)

Finally even if using 2 different rack profiles (rack mount vs blade) we are trying to promote the blade chassis approach, and we want the ToR / rack mount server approach to be the exception rather than the standard.

About FCoE and oversubscription / QoS management, for the moment, the best practice is to leverage the CNA from the server to the access, but beyond the first hop, then use dedicated Ethernet links for LAN and dedicated Ethernet links for FCoE (only) traffic, so the LAN team keeps it's links and the SAN team also

If you consider the "best practice" to dedicate FCoE links, finally oversubscription is managed like it was in the past, considering that storage arrays are working at 8G FC max for the moment and 40GE is on the radar on nexus 7000 and 55xx. Also, with the interface speed increasing (10GE, 40GE soon...) and the cost decreasing to me the question is "does it make sense to care about it ?". My answer is no. According to the applications I host and the throughput I can see in my datacenters, i don't care. Exception : we have a supercomputer (in the top 100 of top500.org) but it will stay on infiniband for the moment.

A few words about management when building a cloud : when you get service profiles (like what you can find with UCS, HP Virtual Connect...) the replacement of a blade when it fails is totally transparent, also deploying new servers with profiles is far more efficient than waiting the LAN / SAN team to configure the ethernet port, zoning... in the change management processes. To my opinion all the solutions based on service profiles are far better in a cloud based solution because of the agility and the speed of deployment it provides (not totally transparent with passthrough modules, I'm not sure of the nexus 55xx supports Flexattach to virtualize the WWN ?)

hope this helps.

visitor68 · ‎04-16-2011

Surya, you make a very interesting point with regard to FCoE termination. I happen to agree that there is value in terminating the FCoE traffic at the chassis level, with say a Dell M8428-K FCF. For one, the value proposition with regard to cable consolidation is already in play just by using blades in the first place. And the I/O consolidation is provided as a result of replacing 1G NICs and HBAs with a dual-port 10G CNA. So, there is definitely something beneficial to this approach because it exploits the value props of cable and I/O consolidation WITHOUT having to move the converged traffic to the ToR, just to have to separate the LAN and SAN traffic at the ToR and have each go to their respective networks (LAN and SAN). One may ask, what then was the benefit of expending anywhere between 20 and 50 percent of the uplink bandwidth between the chassis and the ToR just to dump the FC traffic there? And I agree with that.

The reality, though, is that many architects have already bought into the Cisco narrative for terminating FCoE at the ToR, and eventually extending the unified fabric to the EoR and the Core. The endgame being the eventual replacement of the FC san with an FCoE SAN. And that doesn't fit into the chassis-based FCoE termination paradigm. I am very surprised that you are thinking about this or actually doing this in your shop! I like people who think outside the box.

As for oversubscription and 40G as the remedy, what is it going to take to upgrade all of one's server CNAs, switch ports/FEX ports, twinax cables (perhaps) to support 40G. It's not as if one only has to do a software upgrade on a device and be done with it. So, the 40G argument has limited merit, I think.

With regard to management and service profiles, I assume you mean a virtual infrastructure manager that can create server profiles/personas for different applications and apply them to commoditized hardware at will. If so, I agree. Actually Dell has a great solution that used to belong to Scalent until Dell bought them. It's called AIM - Advanced Infrastructure Manager. There is also an orchestration tool called VIS Creator. I know Cisco has similar functionality with the UCS manager.

Surya ARBY · ‎04-16-2011

Some quick comments

The reality, though, is that many architects have already bought into the Cisco narrative for terminating FCoE at the ToR, and eventually extending the unified fabric to the EoR and the Core. The endgame being the eventual replacement of the FC san with an FCoE SAN. And that doesn't fit into the chassis-based FCoE termination paradigm. I am very surprised that you are thinking about this or actually doing this in your shop! I like people who think outside the box.

Yes, Cisco's slides are great, what is not shown is when a broadcast storm occurs, before FCoE you just had to block the loop and reboot the switches, after FCoE all the servers have to be restored from the tapes as datas are corrupted on the storage.

We don't like to be early adopters and to debug live in production

Also our SAN is based on brocade for years, and our LAN is based on Cisco. The SAN team just doesn't want to drop Brocade

As for oversubscription and 40G as the remedy, what is it going to take to upgrade all of one's server CNAs, switch ports/FEX ports, twinax cables (perhaps) to support 40G. It's not as if one only has to do a software upgrade on a device and be done with it. So, the 40G argument has limited merit, I think.

I was only thinking to 40GE for aggregating uplinks on the access, to me 10GE to the server is sufficient when I check the real needs of bandwidth for a single server, even with I/O consolidation.

visitor68 · ‎04-16-2011

"...after FCoE all the servers have to be restored from the tapes as datas are corrupted on the storage."

Interesting, can you please elaborate more on that. I never thought of that before nor have I ever heard of such a concern.

Surya ARBY · ‎04-17-2011

When you share the wire and the switch, you share the risks. Control and data planes are common, so when you face an incident (storm, deffective optics) you'll impact both environment. During my trainngs I heard stories from customers having to restore the servers from the backups after a broadcast storms occurs and killed all the nexus 5k (consolidating I/O in the access layer).