cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3902
Views
0
Helpful
4
Replies

Single APIC in a 3-device cluster - "Data Layer Partially Diverged"

tuanquangnguyen
Level 1
Level 1

Hi Community,

I recently tested a case where 3 APICs are discovered correctly, then we shut down the whole ACI fabric and power on half of the leaves, spines with one APIC (with node ID 2).

If I understood correctly, 3 APICs are only required for HA and avoid split-brain scenario (as ACI configs are segregated into shards. Each is copied into 3 replicas held by the 3 APICs, with one APIC being the master of that certain shard). So, if there's only a single APIC, it would be the master of all shards and I should still be able to configure things.

But when half of the ACI fabric was powered on, I wasn't able to configure anything. The APIC status was "Data Layer Partially Diverged", and if I were to configure anything, I would get a "The messaging layer was unable to deliver the stimulus".

So what is the point of the cluster then? If, it's a rare case but if two of my APICs suddenly failed then I wouldn't be able to configure anything, despite the network still operates normally. The surviving APIC would also be waiting for its failure as well.

The APIC model is APIC-SERVER-M3 (UCS C220-M5), with Leaves being YC-EX and GC-FXP, Spines being 9332C

1 Accepted Solution

Accepted Solutions

Claudia de Luna
Spotlight
Spotlight

Hi @tuanquangnguyen,

 

While the redundancy you describe is correct, you need a quorum (2 APICs) to actually make changes.  With one epic (without that "majority" ) the fabric is in Read Only mode, still able to pass traffic but changes cannot be made.  If you turn on one more APIC and the half of the fabric you have on you should be operational again.

 

Thats why you see the Multi Pod designs with the "spare" APIC at the site with only one product APIC so that if you lose the site with two APICs, you can promote that "spare" APIC and get back the ability to make changes.

View solution in original post

4 Replies 4

Claudia de Luna
Spotlight
Spotlight

Hi @tuanquangnguyen,

 

While the redundancy you describe is correct, you need a quorum (2 APICs) to actually make changes.  With one epic (without that "majority" ) the fabric is in Read Only mode, still able to pass traffic but changes cannot be made.  If you turn on one more APIC and the half of the fabric you have on you should be operational again.

 

Thats why you see the Multi Pod designs with the "spare" APIC at the site with only one product APIC so that if you lose the site with two APICs, you can promote that "spare" APIC and get back the ability to make changes.

Hi @Claudia de Luna,

Thanks for your response. I also took a read up on this topic on Cisco Live's slide, and would try to turn on two APICs instead of one the next time.

 

Yes..good information there.

 

The Multi Pod White Paper has an entire chapter on this and its good information as well.

https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-737855.html#APICClusterDeploymentConsiderations

6askorobogatov
Level 1
Level 1

one of the possible solution is to have 4 APICs, 3 active, 1 stand-by In the situation when half of the fabric (assuming one data centre) goes down 4th APIC will be promoted to active and fabric will be fully functional.  

that is not a theory, it is fully functioning real life installation.  

Review Cisco Networking for a $25 gift card

Save 25% on Day-2 Operations Add-On License