09-07-2011 04:11 PM
Hi All,
Last night we were testing failover of our Nexus 5000 ToR switches and found that they did not behave as expected. In this rack we have two Nexus 5000's connected directly to a series of ESX servers. These ESX servers are then bundled into vPC's back to the two Nexus 5000's. No N2000 fabric extenders are in use here.
We simulated a failover by reloading the primary Nexus 5000 and thought that this would keep the vPC's up on the secondary Nexus 5000 - but this wasn't the case. When the primary Nexus 5000 was reloaded, we found that the secondary Nexus 5000 suspended all it's vPC's. What's going on here? Why didn't the secondary Nexus 5000 keep the vPC's up??? We performed the same failover this time by reloading the secondary Nexus 5000 and this worked fine because the primary Nexus 5000 maintained the vPC's.
Any ideas as to why the secondary Neuxs 5000 failed to keep the vPC's up whilst we performed a reload of the primary Nexus 5000??? I thought it would almost be seemless when one fails else why have two Nexus in the rack Could it be that we're running a quite an old NX-OS (4.2) ???
Software
BIOS: version 1.2.0
loader: version N/A
kickstart: version 4.2(1)N2(1)
system: version 4.2(1)N2(1)
power-seq: version v1.2
BIOS compile time: 06/19/08
kickstart image file is: bootflash:/n5000-uk9-kickstart.4.2.1.N2.1.bin
kickstart compile time: 7/28/2010 18:00:00 [07/29/2010 11:10:19]
system image file is: bootflash:/n5000-uk9.4.2.1.N2.1.bin
system compile time: 7/28/2010 18:00:00 [07/29/2010 15:18:12]
Hardware
cisco Nexus5010 Chassis ("20x10GE/Supervisor")
Intel(R) Celeron(R) M CPU with 2074284 kB of memory.
Thanks in advance.
Andy
Solved! Go to Solution.
09-07-2011 09:26 PM
I am going to assume you are using the management interface as peer-keep-alive. If this is true, you are hitting bug ID CSCti82166:
When you reload the primary, this bug cause the switch to take a little bit too long to bring down the management interface. This also causes the peer-link to go down before the management interface, which causes the secondary to suspend it's vPCs. You can upgrade to 5.0(2)N1(1) or above or do not use the management interface as peer-keep-alive.
HTH,
jerry
09-07-2011 09:26 PM
I am going to assume you are using the management interface as peer-keep-alive. If this is true, you are hitting bug ID CSCti82166:
When you reload the primary, this bug cause the switch to take a little bit too long to bring down the management interface. This also causes the peer-link to go down before the management interface, which causes the secondary to suspend it's vPCs. You can upgrade to 5.0(2)N1(1) or above or do not use the management interface as peer-keep-alive.
HTH,
jerry
09-07-2011 10:43 PM
Hi Jerry,
That's exactly what happened. Thanks for the Bug ID
Cheers.
Andy
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide