Discussion between networks and the server guys about VMotion needing it's own physical NIC. Most of our ESXi 5.1 servers have six 1Gb NICs in a trunk passing all Vlan's across that trunk. Supposedly a VM guru has said that best practice is to have VMotion on it's own physical NIC. I want some opinions on this. Currently we are running VMotion across the trunks with no issues. I have read several articles that VMotion can eat up bandwidth, but I am not seeing that for the VMotion vlan on our network. Network utilization is minimal. I have some of the ESX servers on dual 10Gb nics that are trunked and we will be moving toward UCS in the future. In my opinion, as long as VMotion is in a seperate Vlan running across the trunk it should work. Essentially the same as a seperate physical link. I would love to hear other opinions as this has been a running battle. Server guys want more physical links, whereas the network team want to make the best use of our resources. Thanks for any info anybody can supply.
Best Practice is to have VMotion and other stuff like NLB separated from User Traffic.
so really 2x 10G in etherchannel for User Traffic and at least 1x 10G for VMotion and then another port for stuff like NLB.
If this is not do-able then it could cause problems - big and small - ranging from the users having reduced service due to VMotion using the bandwidth for the transfer of a virtual server between blades - or VMotion having problems because the users are hogging all the bandwidth and a server could then go for a longer time offline.
Yes that is some of what I have seen that VMotion can take up a lot of the bandwidth, but looking at Solarwinds Orion, the VLAN for VMotion max's out at bps not kbps, mbps or gbps. As for network utilization by an ESX server it is running at about 1%. I do appreciate the feedback as I research this issue further, but currently bandwidth does not seem to be an issue, especially when we move to UCS. Thanks again for your input.
hmm... If they are following their best practice, as network engineers/consultants/specialists etc... we should probably be helping them achieve that. But I do understand where you are coming from. Bandwidth is money!
What tests have you guys done to prove that the VMotion nic sits at 0-1% - have you guys actually monitored what happens when you perform VMotion? My experience is you need the bandwidth available to have the best performance. Any slowness can be catastrophic! (to the end user/application or process)
So in the environment I work in, we actually have 2Gb portchannel dedicated link for VMotion to each ESX host!
Remember, Solarwinds Orion polls at certain times e.g. every 1min or every 5min - its not actively monitoring as such, its not garunteed to capture the spike or burst if there is one, hence you wouldn't see it.
Lets look at it this way too....
There is an application I am using, which connects to the VM - lets take Exchange 2010 as an example. If for some reason there was a failure or not enough resources on the ESX host itself and made a decision to VMotion, it VMotions to a different ESX host. But that time, my PC and Outlook disconnected and got effected or crashed? why? because it took a bit more time to VMotion - bandwidth wasn't there to help transition 'fast enough'
It may not necessarily be the case when I have enough B/W / dedicated links for a VMotion to take place quicker. Probably wont notice any difference, or perhaps a reconnect?
The more B/W, the more quicker a VMotion process will be, the less impact it will have on services.
PS - 1% seems quite low - unless your traffic profile is low and you dont have much running on your network or very few VM's on the ESX host. Check the reference B/W in Orion to see if it matches the real interface bandwidth
Please rate useful posts & remember to mark any solved questions as answered. Thank you.