cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2723
Views
0
Helpful
7
Replies

VMware FT failover not transparent on Nexus1000v

roger.even
Level 1
Level 1

I was wondering if anybody could help me with this one.

I'm staging a new virtual infrastructure architecture for my company hosted onVMware ESX4.1 with N1KV 4.0(4) SV1 (3b)  solution on top.

When testing the final design I've noticed that a VMware FT protected VM would lose connectivity for approx 8 seconds -actually losing TCP state!- upon a FT failover event while having the VM hosted under the N1K, hence defeating the whole purpose of the VMware FT.

However, while executing the very same operation having the VM hosted on the standard vSwitch on the same server, the failover event is completely transparent, i.e., no state loss..

Are there any known compatibility issues with VMware FT and the Nexus1000V product??

7 Replies 7

Robert Burns
Cisco Employee
Cisco Employee

Roger,

Currently this is a known issue and Cisco & VMware are actively working towards a fix.  There's an issues with the 1000v being able to bring up the FT VM's network interface until it receives a detach notification from the primary.

I will update this thread as soon as there is a permanent fix.

The only workaround for this is to use a "system vlan" on the FT VM's vEth interface.

Regards,

Robert

Dear Robert,

It has been more then a year to date. Any news on a permanent fix perhaps?

Rgrds,

Roger

Roger,

The bug we have opened currently is still open - reason being as VMware & Cisco need to jointly address this.

I did find out the following:

When using the FT failover test button, there a max 9 seconds  delay. The delay is due to a 9 seconds delay in the FT test code to bring down the primary VM.

When simulating an actual ESX host outage for the primary VM (pull the power cord to bring down the ESX host where the primary VM is on), there is a max 6 seconds delay. Because VSM relies on heartbeat to detect the VEM is done and remove the attach for the primary VM.

So it does appear that a "test" FT failover event does not produce the actual result of a Host failure.

I've bumped my Dev team again to see if there have been any improvements to this since 1.3b.

The related bug is CSCtl04574.  So far your TAC SR is the only case linked to this bug.  I'll let you know what Development comes back with by end of the week.

Regards,

Robert

Yes, I find this very odd to say the least that we are abviously the only customer facing this very problem; at the time I thought that we perhaps were one of the early adopters running VMware FT in production, but to date, who isn't??

This workaround we have in place now >1year (using system vlan on non-system vlan portgroups which require FT functionality) is getting a little silly as basic switchport functionality like shutdown on a veth interface is not possbile due to this workaround.

N1KV01(config)# int veth59

N1KV01(config-if)# sh

ERROR: Cannot set port admin status to 'shutdown' for interface inheriting a system port-profile

We are currently running version4.2.1.SV1.4

Rogers,

In your instance, do you have any other features enabled in your FT profile?  (Qos, ACLs etc)

We may have a solution, but these features may not be available.  We're still working on a permanent fix also.

Regards,

Robert

Hi Robert,

thank you for chassing this, much appreciated!!

Our VM facing port-profiles all have no feature specifics, e.g.,

port-profile type vethernet DATABASE-TIER

  vmware port-group

  switchport mode access

  switchport access vlan 831

  no shutdown

  system vlan 831

  max-ports 32

  state enabled

Thanks Roger. 

This is hot on my radar - let me circle back with the dev team with this info and see what we can do here.

I WILL be back!

Regards,

Robert

Review Cisco Networking for a $25 gift card