Solved: Primary Nexus 5000 reload causes Secondary Nexus 5000 to shutdown vPC's

asaykao73 · ‎09-07-2011

Hi All,

Last night we were testing failover of our Nexus 5000 ToR switches and found that they did not behave as expected. In this rack we have two Nexus 5000's connected directly to a series of ESX servers. These ESX servers are then bundled into vPC's back to the two Nexus 5000's. No N2000 fabric extenders are in use here.

We simulated a failover by reloading the primary Nexus 5000 and thought that this would keep the vPC's up on the secondary Nexus 5000 - but this wasn't the case. When the primary Nexus 5000 was reloaded, we found that the secondary Nexus 5000 suspended all it's vPC's. What's going on here? Why didn't the secondary Nexus 5000 keep the vPC's up??? We performed the same failover this time by reloading the secondary Nexus 5000 and this worked fine because the primary Nexus 5000 maintained the vPC's.

Any ideas as to why the secondary Neuxs 5000 failed to keep the vPC's up whilst we performed a reload of the primary Nexus 5000??? I thought it would almost be seemless when one fails else why have two Nexus in the rack Could it be that we're running a quite an old NX-OS (4.2) ???

Software

BIOS: version 1.2.0

loader: version N/A

kickstart: version 4.2(1)N2(1)

system: version 4.2(1)N2(1)

power-seq: version v1.2

BIOS compile time: 06/19/08

kickstart image file is: bootflash:/n5000-uk9-kickstart.4.2.1.N2.1.bin

kickstart compile time: 7/28/2010 18:00:00 [07/29/2010 11:10:19]

system image file is: bootflash:/n5000-uk9.4.2.1.N2.1.bin

system compile time: 7/28/2010 18:00:00 [07/29/2010 15:18:12]

Hardware

cisco Nexus5010 Chassis ("20x10GE/Supervisor")

Intel(R) Celeron(R) M CPU with 2074284 kB of memory.

Thanks in advance.

Andy

Jerry Ye · ‎09-07-2011

I am going to assume you are using the management interface as peer-keep-alive. If this is true, you are hitting bug ID CSCti82166:

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCti82166

When you reload the primary, this bug cause the switch to take a little bit too long to bring down the management interface. This also causes the peer-link to go down before the management interface, which causes the secondary to suspend it's vPCs. You can upgrade to 5.0(2)N1(1) or above or do not use the management interface as peer-keep-alive.

HTH,

jerry

View solution in original post

Jerry Ye · ‎09-07-2011

I am going to assume you are using the management interface as peer-keep-alive. If this is true, you are hitting bug ID CSCti82166:

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCti82166

When you reload the primary, this bug cause the switch to take a little bit too long to bring down the management interface. This also causes the peer-link to go down before the management interface, which causes the secondary to suspend it's vPCs. You can upgrade to 5.0(2)N1(1) or above or do not use the management interface as peer-keep-alive.

HTH,

jerry

asaykao73 · ‎09-07-2011

Hi Jerry,

That's exactly what happened. Thanks for the Bug ID

Cheers.

Andy