Recommended NDFC cluster deployment for multisite

svenus · ‎10-23-2023

Hi community,

Still getting my head around as the information presented by Cisco has been confusing.

With regard to a greenfield BGP EVPN-VXLAN deployment, what's the recommended approach to deploy the NDFC cluster (vND)? Centralized at one site or distributed?

Turns out I can find 2 contradicting recommendations:

In Cisco Live: BRKDCN-2918, it says distributed.

In the deployment guide, it says centralized: https://www.cisco.com/c/en/us/td/docs/dcn/nd/3x/deployment/cisco-nexus-dashboard-deployment-guide-301/nd-deploy-overview-30x.html

"

Node Distribution for Fabric Controller

For Nexus Dashboard Fabric Controller, we recommend a centralized, single-site deployment. This service does not support recovery in case if two primary nodes are not available and so it gains no redundancy benefits from a distributed cluster, which could instead expose the cluster to interconnection failures when nodes are in different sites."

Now the issue with centralized is that the deployment does not serve any DR at all, as when the site is down, I cannot push config from NDFC anymore so any manually added config to the DR site switches will be treat as "alien" config once the NDFC back up. It will pop up out-of-sync alert and advise to wipe those alien config.

Versus if it is still distributed, following the procedure would be bringing up the standby and promote it to master then a read-write cluster is available at DR.

Which is the right approach? Can anyone shed a light?

ADP89 · ‎10-29-2023

Hello svenus,

You should follow the ND 3x deployment guide you linked as that is the most updated and following current best practices. Distributing the nodes across 2 sites will not change your situation in case you loose the main site as the remaining master node in the DR will not be able to recover the application and it's configurations.

ND nodes should not be attached to the fabrics they control, this is to avoid the fate sharing. Reachability between NDFC and the switches should happen over a routed network.

To recover NDFC functionality in case you loose the main site completely you can restore a backup to a different NDFC cluster in the DR site.

HTH,

ADP