cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2402
Views
15
Helpful
1
Replies

Nexus 7000 - unexpected shutdown of vPC-Ports during reload of the primary vPC Switch

Dear Community,

 

We experienced an unusual behavior of two Nexus 7000 switches within a vPC domain.

According to the attached sketch, we have four N7Ks in two data centers - two Nexus 7Ks are in a vPC domain for each data center.

Both data centers are connected via a Multilayer-vPC.

We had to reload one of these switches and I expected the other N7K in this vPC domain to continue forwarding over its vPC-Member-ports.

Actually, all vPC ports have been disabled on the secondary switch until the reload of the first N7K (vPC-Role: primary) finished.

 

 

Logging on Switch B:

20:11:51 <Switch B> %VPC-2-VPC_SUSP_ALL_VPC: Peer-link going down, suspending all vPCs on secondary

20:12:01 <Switch B> %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 1, VPC peer keep-alive receive has failed

 

In case of a Peer-link failure, I would expect this behavior if the other switch is still reachable via the Peer-Keepalive-Link (via the Mgmt-Port), but since we reloaded the whole switch, the vPCs should continue forwarding. 

Could this be a bug or are there any timers to be tuned?

 

All N7K switches are running on NX-OS 6.2(8)

 

Switch A:

vpc domain 1
  peer-switch
  role priority 2048
  system-priority 1024
  peer-keepalive destination <Mgmt-IP-Switch-B>
  delay restore 360
  peer-gateway
  auto-recovery reload-delay 360
  ip arp synchronize

interface port-channel1
  switchport mode trunk
  switchport trunk allowed vlan <x-y>
  spanning-tree port type network
  vpc peer-link

Switch B:

vpc domain 1
  peer-switch
  role priority 1024
  system-priority 1024
  peer-keepalive destination <Mgmt-IP-Switch-A>
  delay restore 360
  peer-gateway
  auto-recovery reload-delay 360
  ip arp synchronize

interface port-channel1
  switchport mode trunk
  switchport trunk allowed vlan <x-y>
  spanning-tree port type network
  vpc peer-link

 

Best regards

1 Reply 1

Problem solved:

During the reload of the Nexus 7K, the linecards were powerd off a short time earlier than the Mgmt-Interface. As a result of this behavior, the secondary Nexus 7K received at least one vPC-Peer-Keepalive Message while its peer-link was already powerd off. To avoid a split brain scenario, the VPC-member-ports have been shut down.

Now we are using dedicated interfaces on the linecards for the VPC-Peer-Keepalive-Link and a reload of one N7K won't result in a total network outage any more.