Access redundancy testing on Nexus 5000 and VPC - Page 2

franciscohs · ‎02-15-2011

I'm doing some redundancy tests for servers connected through CNA (only ethernet for the moment) to two Nexus 5010 switches with VPC. Test lab looks like the attached file. Additionaly, both Nexus 5K switches are connected through the management port and VPC keepalive uses it. These are simple tests for now, just the test machine ping flooding (about 1000 per second) the ESX and looking how many pings I loose in each case. (which gives me a rought downtime in milliseconds)

I've so far tested taking down individual link to the servers, uplinks, VPC peer link and failover times have been very good, as expected.

My main problem is in an scenario where a N5K fails. When I take the Nexus down, failover is very quick as expected, similar to any other link failure.

The problem comes when the Nexus comes back up. At that point I have a downtime of over 3 secs and I'm not so sure why. I can see that on that downtime moment, the VPC keepalive shows the other device up, in fact, it seems the downtime starts when the VPC keepalive goes up and thus I'm thinking this has to do with some kind of VPC inconsistency. (there is not much time to test and each reboot takes a lot of time) Changing keepalive values doesn't help much either, it doesn't seem to be related to it.

Can anyone explain what might be the root of this problem and how to overcome it?

Thanks

Francisco

vdsudame · ‎03-01-2011

franciscohs, we probably need to debug this further, you might want to open a case with Cisco TAC on this.