Oversubscription Ratios (OSR)

lamav · ‎01-15-2011

Yes, ladies and gentlemen, the old oversubscription dilemma. What to think about when planning a data center deployment...

No doubt it is difficult to come up with a rule of thumb or some canned answer with regard to OSRs. There are so many variables: the characteristic of the applications running on the compute platforms, client expectations, application tolerance to delay and retransmissions, quality of service, importance of the application in terms of mission criticality, host virtualization and the number of virtual machines running on the platform...I'm sure there are more, but you get the point, I'm sure.

In a classic hierarchical data center model, the access layer's OSR has typically been higher than at the aggregation and core layers for obvious reasons. As you go north in the layered model, the switching platforms have to handle more aggregated traffic. The advent of 10GE between layers has improved the OSR at each layer most notably at the access layer. But it's a double-edged sword, isn't it? This improvemnet is only temporal as the link speed and offered data rates from server to access layer are also increasing. 10GE between server and access has once again raised the OSR. This pendulum should once again swing in the direction of more favorable OSRs with the advent of 40 and 100GE between switching layers.

On this board, there are many seasoned professionals who have had to negotiate this challenge in the past and present. I am curious to know what metrics were used during the planning phase of your new data center deployment and what thought leadership did you bring to the table - or perhaps experienced from others. With all the variables one must take into account, is there indeed a best practice that loosely takes these variables into consideration and allows for an accepted starting point? Sure, we would all love that golden 1:1 ratio, but how far from this ideal mark is acceptable?

I would love to hear some thoughts.....

I know this is can make for a tedious discussion, so it's not for the faint hearted. :-)

Regards

Victor

lamav · ‎01-16-2011

no takers?

lamav · ‎01-17-2011

OK - I've translated my question into every major language spoken on this board, so I am hoping to get some feedback. :-)

Amit Singh · ‎01-17-2011

Victor,

First of all, I was kindaa amazed to see your other post in Hindi, Indian language :-). I am originally from India but it was very hard for me to even understand those heavy words :-)

That's a great discussion to start about an on-going challenges that we all face in our day to day engagements. When I meet with my customers, they all talk about 10G backbone, 40G uplinks etc...The question is what they exactly need. When we drill down further in our discussions and I ask them, have you ever monitored your network, your server trafiic? what is the utlization that you see on the server interface both in LAN and SAN world. They answers that I get 95% is , we havenot done that yet but we think 10G is the right solution for us.

In my recent engagement, one of the customer wanted his each server on 10G and all the SAN FC ports on 8G. I asked the same questions and there was no data they had to prove that they really need those interface in next 5 years. Later the solution that we designed was a hybrid solution with 1G for the rack server and 10G for VMware environment.Oversubscribed design for their SAN with an upgradable solution to 8G FC ports when they will be their new VTL's.

Infact when original Ethernet networks were designed they were meant to be oversubscribed. In my point of view oversubscription is good over a network. This way you can really design a good solution with less money and use it in more effective way to utilize the over all network devices. You cannot control the overubscription over the network really. For example, if you have 40G server at the access layer, you dont want to design a distribution with 400G uplink bandwidth. That network will be unrealistic for me. Infact oversubscription help network devices to explore their over all hardware potential like packet buffers, interface queues etc..to justify you investment in hardware :-). without this exploitation one would wonder, if they have really done a right investment or not.

What really we can do is to control the oversubscription at the main layers to have a solution work effectively.You have to decide what oversubscription ratio works for you on the LAN side, 8:1, 16:1 24:1. That means for a 48 1G ports switch, how many uplink do you want, 2, 4 or 10G interface.

Obviously with more cloud and virtualization themes around people are thinking about all 10G server with 40G and 100G uplinks. Man vendors like EMC, VMWare, Cisco, Juniper, HP, IBM, Qlogic, intel and manymore, everyone is talking about 10G server with lots of VM's on it and utilizating 10G TOR deployments. I see a potential value of these servers running on 10G when they are :

1. Virtualized using technology like VMware, Citrix Xen etc.

2. Using 10G FCOE for converged infrastructure and have FCOE/Ethernet traffic using the same link

3. Lossless architecture to have FC traffic prioritzed always.

This way we are utilizing and justifying the 10G deployments to the maximum. If the customer's just want to have a 10G NIC on the server with 2x 4G FC HBA on the same server, it doesnot make sense to me. This way they have an under-utilized infrastructure with more CAPEX and probably more OPEX.

Just to reiterate, it takes a series of calls and a lot of efforts to get to the right solution fit for a customer. But it depends on how open is your customer to share their present challenges, future plans and how open they are to listen to you and truts you to design the best solution around.

You cannot really control the oversubscription ona network, you have to live with it.

Just my 0.02 cents.

Cheers,

-amit singh

Jon Marshall · ‎01-17-2011

Victor

There is no definitive answer to this. Cisco docs have general OSR guidelines which you can reference but it is very much application dependant. Here are some of my experiences/thoughts on this (which i suspect some may disagree with) -

OSRs often assume you can generate the traffic needed to hit those OSRs. In my experience this is very often not the case. Nowadays, pretty much any new DC design will look to deploy 10Gbps for scalability, flexibility etc. but whether or not that bandwidth is needed is often debatable. It depends entirely on the applications the company runs + the DC access (see below). So you have an access-layer that is 10Gbps capable and because of this your distro/core now look overloaded but really, just how much traffic is actually being generated from that access-layer ? Too many companies buy into the latest and greatest bandwidth without ever really considering whether they need it or not.

And it then becomes self-generating. As you say with 40/100Gbps on the way the pendulum will swing back the other way. And this is exactly what the argument will then be ie. "you have 10Gbps to your access-layer, if you want to improve your OSR then you really need 40 or 100Gbps in your distro/core layer" and so the company buys into that too.

A further consideration is how access to the DC is provided. Often in designs you see a campus network directly attached to the DC so access bandwidth is not such an issue. But from what i have seen, as often as not, the DC is accessed via the WAN. So you could end up with a lovely 10Gbps DC with WAN connections of 2 x 100Mbps, now that is oversubscription. You could still argue that the bandwidth is needed within the DC for backups etc. but that is a separate argument and not one for providing the best possible access to end users.

And, as with everything, it often comes down to the company and the applications it runs. The last company i worked for had over 1000 applications running in the DC. A handful of them had high bandwidth requirements and that high bandwidth was server to server communication within the DC not end user to server. And that's what i was getting at when i talked about companies often buying into things they don't need. You have high bandwidth requirements for these few apps, then you need 10Gbps in the access-layer. If you have 10Gbps in the access-layer well you now have an OSR problem to the distro layer and the core so what do you do.

Or you could simply make sure that server to server communication for these apps did not have to leave the access-layer.

That's not to say OSRs are not important and it's also not to say that there are no companies that need this throughput. But, paradoxically, for a lot of DCs, if they were actually designed and built for what the company needed rather than what was available then the OSRs would actually be a lot more important.

I appreciate this may not be a direct answer to your question but i thought i'd respond just to save you having to learn any more languages

Jon

Nathan Spitzer · ‎01-17-2011

Amen brother.....

I have server people who want 10Gb connections and don't like it when I point out that 10Gbs would melt the backplane of the server. Or that for 1/20 the cost I can give them dual (or quad) 1Gb connections with load-balancing that will work fine for their workloads and is much less picky about cabling

People vastly underestimate just how much data 1Gb is, much less 10Gb whether its SAN or network. The number of servers (devices) that can actually generate the IO to push more than 1Gbs across a single connection is a small fraction of most organizations and if you have one that both can and does you KNOW IT! Most likely its either:

The SAN/NAS that takes up a whole row in your datacenter
The BIG database server(s) that service all 10,000 of your stores
The disk back-up system that backs the above up

Yes, the Walmarts, Amazons, Oracles, Los Alamos, and Visa/Mastercard/American Express of the world can generate and sustain that kind of load but most of us in the real world can't.

Of course, when I tech refreshed my campus I got 10-gb core connections cause its cool.....

Nathan Spitzer

Sr. Network Communication Analyst

Lockheed Martin

lamav · ‎01-17-2011

Gentlemen:

I want to thank you very much for taking the time to engage me and provide me your thoughts and the benefit of your experience.

I agree with all of you that there are so many variables regarding acceptable OSRs that it's impossible to come up with some magic number. Notwithstanding, in the end we do have to pick an OSR that we think we can work with when we have to refresh a data center's server farm. So, one metric I have seen used is the "how does the network run now?" approach. If I have 1G servers with 4Gb port channels and everything is running great, then one can safely use that as a good starting point and not worry about deviating too much from it. However, the questions to take into consideration are whether virtualization and/or fabric convergence will figure into the new design. By the way, the average vMotion can really put a strain on even a 10G link.

I also agree that most network personnel have absolutely no idea what kind of workloads they are running in their data centers. And when you ask them, the thought of actually looking for some baseline information just sounds like too much work to b e "bothered" with. Hence, they resort to the approach I mentioned above.

I will say, though, that the OSR is only part of the traffic forwarding capability that must be taken into consideration. The absolute traffic offered and the actual available uplink bandwidth between switching layers -- in other words, not oversubscription as a ratio but as the average available bandwidth per entity -- must figure into the design equation.

For example, with two 6140 Fabric Interconnects, 40 UCS chassis (320 servers) can be supported with an OSR of 8:1. That''s not bad at all. However, if we look at actual available bandwidth, we can see that each server will have only 1.25Gb available for data and FCoE traffic. Now add virtualization on top of that. That is a really inadequate amount of available bandwidth, although the ratio in and of itself is not bad at all.

Would love to have more feedback and hear the experiences of our talented community.

Thanks, again!

Shashank Singh · ‎01-17-2011

Hi Everybody,

I found this discussion really helpful as we have thoughts from so many people who seem to have experienced oversubscription in Cisco equipment. I would just like to add some more of my thoughts.

The basic concept behind OSR is that not all clients utilize 100% of the interface bandwidth at all times. So, whether or not an oversubscribed device can work fine in a network depends on answer to this question: Will you utilize 100% of the interface bandwidth on all ports at all times?

OSR actually makes 'available bandwidth per port' a fuzzy concept. There is no fixed value which can be calculated as it will depend on how heavily are the other ports in the port group are being utilized. For eg. with a populated port group on a gigabit line card with OSR of 8:1 it may seem that each port has 1/8 gig of actual bandwidth. Now I am not denying the math but this infact is the minimum value and holds true only when all other 7 ports in that port group are being utilized at 100% (which seldom happens). The actual available bandwidth keeps floating between 1/8 gig and 1 gig depending on how much traffic is being pumped across the remaining seven ports.

At the same time, marketing specs for the devices tend to ignore OSR on the device and the numbers presented are often the maximum values (which are never acheived practically). As already pointed out by people, there are variables other than bandwidth alone. Some of them are forwarding capacity of the ASIC behind the port group and shared ingress and egress buffer for the port group.

Considering the fact that most of the switching gear in the market is oversubscribed, the 'which one to buy' question becomes increasingly complicated. Two things which can be kept in mind before purchsing any switching equipment are:

Will all the ports be used at almost 100% of the bandwidth at all times?
What is the possibilty of more ports on the same box being utilized as a result of a possible future expansion?

Cheers,

Shashank

lamav · ‎01-18-2011

Shahshank, some very good points. You are right that not every host is going to be transmitting its maximum offered load all the time, so the 1.25Gb/s available bandwidth per server is a minimum available uplink bandwidth. But this reality speaks to the original motivation for creating this thread: how one goes about determining what kind of OSR and actual bandwidth is enough. There are variables to take into consideration vis-a-vis the actual workload bandwidth requirements, the characteristics of the application, client expectations, etc.

Good stuff....

Victor