Nexus 7000 - unexpected shutdown of vPC-Ports during reload of the primary vPC Switch

Wilfried Olthoff · ‎10-01-2014

Dear Community,

We experienced an unusual behavior of two Nexus 7000 switches within a vPC domain.

According to the attached sketch, we have four N7Ks in two data centers - two Nexus 7Ks are in a vPC domain for each data center.

Both data centers are connected via a Multilayer-vPC.

We had to reload one of these switches and I expected the other N7K in this vPC domain to continue forwarding over its vPC-Member-ports.

Actually, all vPC ports have been disabled on the secondary switch until the reload of the first N7K (vPC-Role: primary) finished.

Logging on Switch B:

20:11:51 <Switch B> %VPC-2-VPC_SUSP_ALL_VPC: Peer-link going down, suspending all vPCs on secondary

20:12:01 <Switch B> %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 1, VPC peer keep-alive receive has failed

In case of a Peer-link failure, I would expect this behavior if the other switch is still reachable via the Peer-Keepalive-Link (via the Mgmt-Port), but since we reloaded the whole switch, the vPCs should continue forwarding.

Could this be a bug or are there any timers to be tuned?

All N7K switches are running on NX-OS 6.2(8)

Switch A:

vpc domain 1
peer-switch
role priority 2048
system-priority 1024
peer-keepalive destination <Mgmt-IP-Switch-B>
delay restore 360
peer-gateway
auto-recovery reload-delay 360
ip arp synchronize

interface port-channel1
switchport mode trunk
switchport trunk allowed vlan <x-y>
spanning-tree port type network
vpc peer-link

Switch B:

vpc domain 1
peer-switch
role priority 1024
system-priority 1024
peer-keepalive destination <Mgmt-IP-Switch-A>
delay restore 360
peer-gateway
auto-recovery reload-delay 360
ip arp synchronize

interface port-channel1
switchport mode trunk
switchport trunk allowed vlan <x-y>
spanning-tree port type network
vpc peer-link

Best regards

Wilfried Olthoff · ‎10-19-2014

Problem solved:

During the reload of the Nexus 7K, the linecards were powerd off a short time earlier than the Mgmt-Interface. As a result of this behavior, the secondary Nexus 7K received at least one vPC-Peer-Keepalive Message while its peer-link was already powerd off. To avoid a split brain scenario, the VPC-member-ports have been shut down.

Now we are using dedicated interfaces on the linecards for the VPC-Peer-Keepalive-Link and a reload of one N7K won't result in a total network outage any more.