Solved: Thanks Wes !

aleks00011 · ‎01-30-2017

Hi,
Were are running Cisco UCS.(5108 Chassis, IOM-2208XP, with M3/M4 blades connected to a FI 6296UP)

We want to go from 2 links to 4 links per IOM.

Once we add the 2 additional cables, we will re-ack 1 IOM at a time to make everything aware of the extra links. (we don't want to re-ack the whole chassis at once) So doing this should prevent any downtime from the blade's perspective. (assuming a re-ack will be enough the see the 2 extra links, instead of rebooting the IOM)

Question is: Since the VM traffic might be running through IOM-A or IOM-B, what will be the "ping timeout" from a Virtual Machine perspective when we re-acknowledge a IOM ?
In other words, how long before the vm traffic performs a failover from IOM-A -> IOM B?

Many thanks!

Wes Austin · ‎01-31-2017

You are correct. If you are utilizing failover in the host operating system, then it would be responsible for balancing traffic in the event of a failure. In my experience, it is just as fast as fabric failover.

As best practice, do not utilize fabric failover (vNIC failover as you discussed above) and OS failover for failover, as it can actually cause issues with data traffic if software and hardware are both attempting to do the same thing.

https://pubs.vmware.com/vsphere-65/index.jsp?topic=%2Fcom.vmware.vsphere.networking.doc%2FGUID-D34B1ADD-B8A7-43CD-AA7E-2832A0F7EE76.html

-Wes

View solution in original post

Wes Austin · ‎01-30-2017

Hello,

Typically when a failover event occurs, we will drop a few pings in order for the gratuitous ARP to inform the upstream switch of the MAC move. I have seen instances where we drop a single ping on a VM until the failover occurs to the other fabric. It is fairly instantaneous.

http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/sw/gui/config/guide/2-0/b_UCSM_GUI_Configuration_Guide_2_0/b_UCSM_GUI_Configuration_Guide_2_0_chapter_0100.html

In addition, a cluster configuration actively enhances failover recovery time for redundant virtual interface (VIF) connections. When an adapter has an active VIF connection to one fabric interconnect and a standby VIF connection to the second, the learned MAC addresses of the active VIF are replicated but not installed on the second fabric interconnect. If the active VIF fails, the second fabric interconnect installs the replicated MAC addresses and broadcasts them to the network through gratuitous ARP messages, shortening the switchover time.

HTH,

Wes

aleks00011 · ‎01-31-2017

Hi Wes,
Thank you for the quick answer!

The failover you talk about would be when failover is taken care of by UCS Correct? (vNIC hardware failover)

Colleague of mine pointed out that in our case, this should be taken care of by vmware instead of ucs.
Since we have 2 active adapters in vmware processing the vm traffic (over A and B side) my guess is now that we are mostly depending on vmware failover capability when we re-ack an IOM module. (which hopefully is in line with the same fast failover capability as ucs )

Wes Austin · ‎01-31-2017

You are correct. If you are utilizing failover in the host operating system, then it would be responsible for balancing traffic in the event of a failure. In my experience, it is just as fast as fabric failover.

As best practice, do not utilize fabric failover (vNIC failover as you discussed above) and OS failover for failover, as it can actually cause issues with data traffic if software and hardware are both attempting to do the same thing.

https://pubs.vmware.com/vsphere-65/index.jsp?topic=%2Fcom.vmware.vsphere.networking.doc%2FGUID-D34B1ADD-B8A7-43CD-AA7E-2832A0F7EE76.html

-Wes

aleks00011 · ‎02-01-2017

Thanks Wes !

ping timeout for a VM when Acknowledging or rebooting an IOM ?