Re: nexus vpc peer-keepalive link with separate switch doubt

xiexiaoyang · ‎02-22-2023

According to the link below, why do we need to use a separate switch when we use the management interface and vrf managment to create a vpc peer-keepalive link?

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/6-x/interfaces/configuration/guide/b_Cisco_Nexus_9000_Series_NX-OS_Interfaces_Configuration_Guide/b_Cisco_Nexus_9000_Series_NX-OS_Interfaces_Configuration_Guide_chapter_0111.html

We recommend that you associate a peer-keepalive link to a separate virtual routing and forwarding (VRF) instance that is mapped to a Layer 3 interface in each vPC peer device. If you do not configure a separate VRF, the system uses the management VRF by default. However, if you use the management interfaces for the peer-keepalive link, you must put a management switch connected to both the active and standby management ports on each vPC peer device (see figure).

No data or synchronization traffic moves over the vPC peer-keepalive link; the only traffic on this link is a message that indicates that the originating switch is operating and running a vPC.

Christopher Hart · ‎02-22-2023

Hello!

When running the vPC Peer-Keepalive Link over each vPC peer's management interface on modular/chassis switches (e.g. Nexus 7000, Nexus 9500, etc.), it is highly recommended to connect the management interfaces to a separate, out-of-band management switch. This is because each modular/chassis switch may have two supervisors, and only one supervisor will be active at a time. If the management interface of each supervisor is directly connected to the management interface of the remote vPC peer's supervisor, connectivity through the vPC Peer-Keepalive Link may be broken if there is a mismatch between which supervisor is active on either chassis.

This is best demonstrated through an example. Let's say you have two Nexus 9508 switches, N9K-1 and N9K-2, that form a vPC domain. Each switch has two supervisors inserted - one in slot 27, and the other in slot 28. The management interface of N9K-1's supervisor in slot 27 is directly connected to the management interface of N9K-2's supervisor in slot 27. Similarly, the management interface of N9K-1's supervisor in slot 28 is directly connected to the management interface of N9K-2's supervisor in slot 28. Let's assume that the active supervisor of both chassis is the supervisor inserted in slot 27.

As part of normal operations (upgrading the NX-OS software of either switch, replacing hardware, reloading the switch to test failover capabilities, etc.) the active supervisor of either chassis may change from the supervisor in slot 27 to the supervisor in slot 28. The standby supervisor is not capable of sending or receiving packets on its management interface, since it's not the active supervisor. If N9K-1 is upgraded and the supervisor in slot 28 becomes the active supervisor, connectivity through the vPC Peer-Keepalive Link will fail since N9K-2's supervisor in slot 28 is the standby supervisor, not the active supervisor. Thus, N9K-1's active supervisor in slot 28 has no way to connect to N9K-2's active supervisor in slot 27.

On its own, the vPC Peer-Keepalive Link going down will not cause any operational changes in the vPC domain. However, if the vPC Peer-Keepalive Link is down, and then the vPC Peer-Link goes down, that can cause both vPC peers to assume the other has gone down, causing a vPC split-brain scenario. vPC split-brain scenarios can cause a variety of odd connectivity issues within the network, which is obviously undesirable.

The solution to this issue is to ensure the management interface of both supervisors on both vPC peers are connected to an intermediary switch. Since the management interfaces are typically used for management purposes (SSH, SNMP, syslog, etc.) this switch is typically part of a larger out-of-band management network for the entire data center. With this solution, either supervisor can be the active supervisor of its respective chassis at any given time, and connectivity through the vPC Peer-Keepalive Link will remain stable.

I hope this helps - thank you!

-Christopher

xiexiaoyang · ‎02-22-2023

Dear Christopher

Thanks for your reply.

I seem to understand. However, is there something wrong with the following statement and I may be a little confused?

“The management interface of N9K-1's supervisor in slot 27 is directly connected to the management interface of N9K-2's supervisor in slot 28. Similarly, the management interface of N9K-1's supervisor in slot 28 is directly connected to the management interface of N9K-2's supervisor in slot 28. ”

The actual environment is that we have two Nexus 3524 devices (which i think there is only one supervisor per device) and running vPC with some VMware ESXi Servers.Currently the vPC Peer-Keepalive Link is directly connected through the management interface.Due to the need to upgrade and activate Licenses, we plan to restart the two devices one by one.Now we need to evaluate whether the restart operation will cause some impact.

Waiting for your reply, sincerely.

Christopher Hart · ‎02-23-2023

Hello!

Apologies for the confusion, there was a typo in my original reply. I've edited my original reply to read as follows:

"The management interface of N9K-1's supervisor in slot 27 is directly connected to the management interface of N9K-2's supervisor in slot 27. Similarly, the management interface of N9K-1's supervisor in slot 28 is directly connected to the management interface of N9K-2's supervisor in slot 28."

You are correct that this advice does not apply to non-modular/fixed switches like the Nexus 3524. If you do not need any out-of-band management capabilities through the management interface, then you may directly connect the management interface of both vPC peers to each other. This is safe and supported - it will not cause any vPC-related issues if either switch is upgraded, reloaded, etc.

Thank you!

-Christopher

xiexiaoyang · ‎02-24-2023

Hi,Christopher

Thanks again for your reply.

sp2720401 · ‎02-22-2023

I think this is in relation to the function of the management interface itself. The OOBM management interface is part of the management plane and is what should be used to actually manage the device.

If you do not put a switch between those interfaces then there is no way way to provide OOBM access to the switch. RADIUS, SNMP, LOGGING, SSH, HTTPS/API... all of which are OOBM functions.

In an OOBM environment those functions should not be inband and directly connecting two management interfaces together robs you of the ability to manage your devices out of band.

The "must" is a management must and not a hard architecture must. I think this is why the very first two words are "we recommend".

xiexiaoyang · ‎02-22-2023

Dear sp2720401

Thanks for your reply.

I think this is also one of the possible reasons.However, it seems that the statement of Christopher has more basis. Because the picture in the Cisco Document specially distinguished the active & standby link.

sp2720401 · ‎02-23-2023

His basis is that he is telling you about a 9500 with multiple supervisors and multiple management links. The 9500 can do ISSU without issue and the supervisor links will be active or standby depending on which supervisor is active or standby.

You can do your upgrade on your 3524 which only has one supervisor but 2 management links. Any management associated with those links will not be available during a reload. Since you say they are back to back and you do not use them for OOBM management then you may not have an issue.

The issue you may experience is moving from on major version to another and vPC may fail.

xiexiaoyang · ‎02-24-2023

Yes. Thanks for your reply