cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3903
Views
30
Helpful
9
Replies

# of APIC Cluster Nodes

visitor68
Level 4
Level 4

An instructor has just finished telling me that having 5 APIC nodes in the cluster does not offer any additional redundancy than having 3 nodes. Is that accurate? Sounds counter intuitive. 

1 Accepted Solution

Accepted Solutions

Hi,

I believe the reason why you are confused, might be because you are thinking at the redundancy from APIC perspective as a unit and not from shards (database), which is the critical component when it comes to configuration.

The APIC cluster uses a technology from large databases called sharding - very similar concept as horizontal database partitioning, but better. Better in the sense that it increase redundancy and performance because the db tables are split across servers, and smaller tables are replicated as complete units. Cisco APIC uses a replication factor of 3, meaning each shard has 3 instances/replicas (one active and two backup) across the cluster, regardless of the number of APIC nodes present in the cluster.

The read-only state actually refers to shards. When 2 out of 3 copies are down, the remaining shard goes into read-only state, to prevent loss of data in case the other two are comes backup.

In case of a 2 node failure, this is how a 3 node cluster vs 5 node cluster will look like from shards perspective:

failure2.png

So as you can see here, in case of a 5 node cluster there might be a chance that some shards to be in read-only and other in read-write.

if we change to a multi-pod,  you need to consider carefully the distribution of the APICs, to avoid total loss of a shard. Golden rule is: do not keep 3 active APICs in the same pod, because this can happen:

lost.png

The green shard is lost in case of hardware failure of all 3 APICs from pod1. To recover from this state you will need to contact TAC and BU. The procedure is called ‘ID Recovery’ to restore the whole fabric state to the latest taken configuration snapshot (if you have one).

I hope this was informative and is more clear now.

 

Note: I took all these details and images from the following ciscolive presentation:

https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/unified-fabric/white-paper-c11-730021.html 

I recommend you have a look on the presentation (video). You can do that on ciscolive.com and search for BRKACI-2003 in the on-demand library.

 

Cheers,

Sergiu

 

 

View solution in original post

9 Replies 9

RedNectar
VIP
VIP

Hi @visitor68 ,

Your instructor is right!

You need to understand the difference between redundancy and capacity.

Redundancy is achived by having three copies of every shard. It doesn't matter if you have 3, 5 or 7 APICs. There is only ever 3 copies of every shard

Capacity is achieved by distributing the job of looking after the shards between the APICs

Every shard is managed by a single APIC - and then replicated to two other APICs.  So the more APICs you have, the fewer shards each APIC has to manage.

I hope this helps


Don't forget to mark answers as correct if it solves your problem. This helps others find the correct answer if they search for the same problem


RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

Thanks, Chris. Kindly look at my response to Sergiu. I'm missing something in my thinking...

Sergiu.Daniluk
VIP Alumni
VIP Alumni

Hello,

There was a similar discussion about what happens during failures of APIC controllers in a 5 nodes cluster here:

https://community.cisco.com/t5/application-centric/operation-process-in-case-of-apic-cluster-failure/m-p/4062605 

From redundancy perspective, the answer is yes - there is no advantage compared with 3-node cluster.

But then why would you use a 5 node cluster? Because of scalability: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/4-x/verified-scalability/Cisco-ACI-Verified-Scalability-Guide-422.html#id_110978 

 

Regards,

Sergiu

 

Sergiu, I am a bit confused. Sorry. I'm looking at it from the perspective of simple numbers.

  • At least 3 APIC nodes are required, so that I can have a primary copy of the shard and 2 additional copies. Three altogether. Correct?
  • OK. If I lose 2 APIC nodes, I will be down to 1 and that means I will be in a read-only mode. Correct?
  • But if I have 5 APIC nodes and I lose 2, I will be down to 3 and still perfectly able to sustain my 3 copies and still be in a r/w mode? No?

I'm confident here is something wrong with the way I am thinking about it, but can't quite nail it. Sorry to be a pain in the @$$

Hi,

I believe the reason why you are confused, might be because you are thinking at the redundancy from APIC perspective as a unit and not from shards (database), which is the critical component when it comes to configuration.

The APIC cluster uses a technology from large databases called sharding - very similar concept as horizontal database partitioning, but better. Better in the sense that it increase redundancy and performance because the db tables are split across servers, and smaller tables are replicated as complete units. Cisco APIC uses a replication factor of 3, meaning each shard has 3 instances/replicas (one active and two backup) across the cluster, regardless of the number of APIC nodes present in the cluster.

The read-only state actually refers to shards. When 2 out of 3 copies are down, the remaining shard goes into read-only state, to prevent loss of data in case the other two are comes backup.

In case of a 2 node failure, this is how a 3 node cluster vs 5 node cluster will look like from shards perspective:

failure2.png

So as you can see here, in case of a 5 node cluster there might be a chance that some shards to be in read-only and other in read-write.

if we change to a multi-pod,  you need to consider carefully the distribution of the APICs, to avoid total loss of a shard. Golden rule is: do not keep 3 active APICs in the same pod, because this can happen:

lost.png

The green shard is lost in case of hardware failure of all 3 APICs from pod1. To recover from this state you will need to contact TAC and BU. The procedure is called ‘ID Recovery’ to restore the whole fabric state to the latest taken configuration snapshot (if you have one).

I hope this was informative and is more clear now.

 

Note: I took all these details and images from the following ciscolive presentation:

https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/unified-fabric/white-paper-c11-730021.html 

I recommend you have a look on the presentation (video). You can do that on ciscolive.com and search for BRKACI-2003 in the on-demand library.

 

Cheers,

Sergiu

 

 

Thank you very much, Sergiu. Sorry for the delayed response. My day job got very busy. I understand now and I appreciate the time you took to make it clear. 

 

Thanks, again!

omz
VIP Alumni
VIP Alumni

Hi @omz,

In a 5-node cluster all APICs are active, meaning the shards are distributed along all 5 APICs. Standby APIC (or the correct name would be Cold Standby) is something different. More info about how Standby APIC operates can be found here: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/4-x/getting-started/b-Cisco-APIC-Getting-Started-Guide-411/b-Cisco-APIC-Getting-Started-Guide-411_chapter_0101.html#task_78E5005D04AA447EA12E526AB4F8E25E 

 

Regards,

Sergiu

hi @Sergiu.Daniluk 

right ok.. thanks for that.. 

I was going with what it said in that doc  - 

Cisco recommends that you have at least 3 active APICs in a cluster, along with additional standby APICs. A cluster size of 3, 5, or 7 APICs is recommended. A cluster size of 4 or 6 APICs is not recommended.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Save 25% on Day-2 Operations Add-On License