We experiment a total blackout last night in our data center composed of four nexus 7000 in two vPC Domain. We do EIGRP routing on this devices and inter-vlan routing with HSRP. There is some FEXs directly attached to the nexus.
One of our chassis was rebooted because of a read-only filesystem (bug!).
During the reboot the second vPC chassis works well and every servers/storage had network.
The first switch reboots with it's default startup configuration... so no license, every port in shutdown state, no feature, etc...
I start to reconfigure the first vPC chassis with a backup config. Everthing goes well until I recreate the vPC Domain ID.
Before the creation of the vPC Domain ID, every vlan (L2) was put in the config.
I do the operation in this order :
1) create the Po1 for the peer-link
2) create the vPC Domain ID with role priority 10 (second chassis has role priority 20)
3) add the physical interface to the Po1 for the peer-link in no shut state
Just after the step 3, the second chassis say "Type-2 inconsistency" and remove every vlans from the whole vPC !!
=> blackout of 100% of our production servers.
To resolve the problem I just put into the configuration the interface vlan + every Po with vPC without the configuration of physical interfaces.
We lose 12min of L2 communications and every servers has crashed.
We cannot put vpc before a Domain ID was created but if the Domain ID is created without po+vpc, Type-2 inconsistency appends...
Does anybody experiment the same problem ?
How can I do a full restore of a configuration without the lose of traffic ?
Thank for any help !
As per doc: The switch with lower priority will be elected as the vPC primary switch. If the peer link fails, vPC peer will detect whether the peer switch is alive through the vPC peer keepalive link. If the vPC primary switch is alive, the vPC secondary switch will suspend its vPC member ports to prevent potential looping while the vPC primary switch keeps all its vPC member ports active.
Thank you for your response but this is not the secondary switch witch suspend the vPC in our case...
The primary switch was shutdown and the secondary switch took over with the role "operationnal primary" and the stickybit set to "TRUE".
The main problem is because I brought up the PKA and the Peer-Link before finishing the configuration of the port-channel and interface vlan. As per TAC, the correct methode to remplace a Nexus is :
- Shutdown (or unplug) all the ports
- Ensure that the switch is fully configured
- Bring up the PKA
- Bring up the peer-link
- Bring up the other port-channel
- Bring up the orphan ports
Hi Vincent, in that case that was the other thing in my mind, order fir this operations _is_ KEY, and as TAC mentions thats the right order of operations, glad that you got an answer.
We are also hit with this bug but we are waiting for scheduled reboot for the switch.In case if we have the switch rebooted without config can you please share what is the best way to bring back the primary switch with less distruption.
Thanks for your help
The correct procedure to bring back the primary switch into the network is the following.
If your switch doesn't have any config after the reboot, you need first of all check that the stickbit is set to FALSE on the new switch and TRUE on the secondary (operationnal primary) !
Use this command for that :
show system internal vpcm info all | i Sticky|Reload
If the sticky bit on the rebooted switch is set to TRUE you need to re-apply the role priority Under the vpc domain for this switch. The stickybit is the methode used to keep the role and break the preempt.
After that you need to do the following:
- Shutdown all the ports (already shutdown in case of a blank config)
- Configure the vPC domain
- Configure the port-channel for the peer-link in shutdown state
- Configure the physical interface for the peer-link in shutdown state
- Configure the peer-keepalive in shutdown state
- Configure every vPC in no shutdown state (you don't need to configure the physical interface yet)
- Ensure that all the vlan and interface vlan (no shutdown) are created in your switch.
- Double check the stickbit (FALSE on the rebooted switch, TRUE on the primary switch)
- Bring up the peer-keepalive (wait for statu "peer is alive")
- Bring up the peer-link (check if everything is ok)
- Configure every physical interfaces for your vPC and FEXs
Maybe you should say a small prayer also ;-)