09-18-2010 03:29 PM
I believe it may not be possible, but I thought I'd ask around to see if anyone else had run into this. I had an interesting (understatement!) failure today. Our data center experienced a classic cascading power failure (circuit failure shunts load to "backup" or "secondary" circuit, overloading that circuit). The initial end result was a fully down N5K that had been a member of a functioning vPC pair. The other N5K remained up. However, apparently the data center operators, in their zeal to troubleshoot, repeatedly cycled power on the down N5K AND N5K that had initially stayed up.
The combo joy was a dead N5K (green lights, no boot, TAC case already opened for RMA) and the other N5K that booted without ever having a functioning vPC peer.
The end result was a production network completely down, even though 1 of the N5Ks was up and available. Since it couldn't talk to its peer, it wouldn't bring any of the vPCs up (uplinks, server links, nor the peer-link). I poked around, didn't see any obvious way to "force" it to bring them up. I'll be opening a TAC case for that, just to be complete. I've also sent a query to my SE team.
Nevertheless, I wondered if anyone else had experienced this scenario and had come up with a solution. I was fortunate in that I had a spare N5K that I could configure identically to the failed unit and migrate the connections; the vPCs all came up fine after that. When I first installed this system, I tested single switch failures and that worked as expected (once the vPC is up, it survives a partner failure just fine). I just didn't consider the possibility of both switches rebooting, with 1 failing to power back up.
Thoughts?
Hagen
Solved! Go to Solution.
09-20-2010 08:16 AM
Yes I have seen this in our lab environment.
Looks like it's fixed in a pending release.
The only way I could make it work was to remove the "vpc xx" statement from all the port-channels until the dead 5K recovered.
09-20-2010 08:16 AM
Yes I have seen this in our lab environment.
Looks like it's fixed in a pending release.
The only way I could make it work was to remove the "vpc xx" statement from all the port-channels until the dead 5K recovered.
09-20-2010 08:24 AM
Darren, that's it exactly. I guess I'm pending the release of software! I'm hopeful it won't happen again, but nice to know I'm not crazy (at least this time). Dumping the vPC config popped into my head, but since I had the shelf spare N5K, I went for that "fix".
Thanks for the bug ID.
Hagen
09-20-2010 08:41 AM
This could be a scary situation for sure in a production data center environment with the exact conditions you described.
I'm hoping this will be fixed in 5.0(2)N1(1), and you'll also get some new cool features like FEX pre-provisioning, config-sync, and config rollback
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide