Hi all,
I've implemented a back to back vPC on a data center network by using 4 Nexus 9372X distributed in two different sites. Each couple of distributed Nexus corresponds to a vPC domain and auto recovery feature has been configured on it. 2 Nexus implement L3, 2 Nexus are only L2. Because of hardware and geographical limitations I used a L3 connection traversing external switches in order to bring peer keepalive messages. I configured auto recovery mechanism on vPC domain. Release is 7.0(3)I3(1)
Now, I've experienced this issue: one of the 2 L3 Nexus was manually reloaded, I expected the other one in the vPC domain would have taken the traffic control, but unexpectedly it shutted down all vPC interfaces, creating a black hole of trafffic that lasted until primary vpc domain member came back online. This is a significant message found in log:
15:13:00 NX_BF1_Pri %VPC-2-VPC_SUSP_ALL_VPC: Peer-link going down, suspending all vPCs on secondary
15:21:04 NX_BF1_Pri %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 10, VPC peer keep-alive receive has failed
I found some documentation speaking of an expected behavior like this on Nexus7k, but referring to a scenario using mgmt interface for peer keepalive and a reload by software, furthermore it should have been solved with auto recovery feature.
Do you know some related issue with Nexus9k? What am I missing?
Thanks
Chiara