04-13-2020 01:51 PM
An instructor has just finished telling me that having 5 APIC nodes in the cluster does not offer any additional redundancy than having 3 nodes. Is that accurate? Sounds counter intuitive.
Solved! Go to Solution.
04-13-2020 11:37 PM - edited 04-13-2020 11:39 PM
Hi,
I believe the reason why you are confused, might be because you are thinking at the redundancy from APIC perspective as a unit and not from shards (database), which is the critical component when it comes to configuration.
The APIC cluster uses a technology from large databases called sharding - very similar concept as horizontal database partitioning, but better. Better in the sense that it increase redundancy and performance because the db tables are split across servers, and smaller tables are replicated as complete units. Cisco APIC uses a replication factor of 3, meaning each shard has 3 instances/replicas (one active and two backup) across the cluster, regardless of the number of APIC nodes present in the cluster.
The read-only state actually refers to shards. When 2 out of 3 copies are down, the remaining shard goes into read-only state, to prevent loss of data in case the other two are comes backup.
In case of a 2 node failure, this is how a 3 node cluster vs 5 node cluster will look like from shards perspective:
So as you can see here, in case of a 5 node cluster there might be a chance that some shards to be in read-only and other in read-write.
if we change to a multi-pod, you need to consider carefully the distribution of the APICs, to avoid total loss of a shard. Golden rule is: do not keep 3 active APICs in the same pod, because this can happen:
The green shard is lost in case of hardware failure of all 3 APICs from pod1. To recover from this state you will need to contact TAC and BU. The procedure is called ‘ID Recovery’ to restore the whole fabric state to the latest taken configuration snapshot (if you have one).
I hope this was informative and is more clear now.
Note: I took all these details and images from the following ciscolive presentation:
I recommend you have a look on the presentation (video). You can do that on ciscolive.com and search for BRKACI-2003 in the on-demand library.
Cheers,
Sergiu
04-13-2020 01:59 PM
Hi @visitor68 ,
Your instructor is right!
You need to understand the difference between redundancy and capacity.
Redundancy is achived by having three copies of every shard. It doesn't matter if you have 3, 5 or 7 APICs. There is only ever 3 copies of every shard
Capacity is achieved by distributing the job of looking after the shards between the APICs
Every shard is managed by a single APIC - and then replicated to two other APICs. So the more APICs you have, the fewer shards each APIC has to manage.
I hope this helps
Don't forget to mark answers as correct if it solves your problem. This helps others find the correct answer if they search for the same problem
04-13-2020 07:52 PM
Thanks, Chris. Kindly look at my response to Sergiu. I'm missing something in my thinking...
04-13-2020 02:00 PM
Hello,
There was a similar discussion about what happens during failures of APIC controllers in a 5 nodes cluster here:
From redundancy perspective, the answer is yes - there is no advantage compared with 3-node cluster.
But then why would you use a 5 node cluster? Because of scalability: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/4-x/verified-scalability/Cisco-ACI-Verified-Scalability-Guide-422.html#id_110978
Regards,
Sergiu
04-13-2020 07:51 PM - edited 04-13-2020 07:56 PM
Sergiu, I am a bit confused. Sorry. I'm looking at it from the perspective of simple numbers.
I'm confident here is something wrong with the way I am thinking about it, but can't quite nail it. Sorry to be a pain in the @$$
04-13-2020 11:37 PM - edited 04-13-2020 11:39 PM
Hi,
I believe the reason why you are confused, might be because you are thinking at the redundancy from APIC perspective as a unit and not from shards (database), which is the critical component when it comes to configuration.
The APIC cluster uses a technology from large databases called sharding - very similar concept as horizontal database partitioning, but better. Better in the sense that it increase redundancy and performance because the db tables are split across servers, and smaller tables are replicated as complete units. Cisco APIC uses a replication factor of 3, meaning each shard has 3 instances/replicas (one active and two backup) across the cluster, regardless of the number of APIC nodes present in the cluster.
The read-only state actually refers to shards. When 2 out of 3 copies are down, the remaining shard goes into read-only state, to prevent loss of data in case the other two are comes backup.
In case of a 2 node failure, this is how a 3 node cluster vs 5 node cluster will look like from shards perspective:
So as you can see here, in case of a 5 node cluster there might be a chance that some shards to be in read-only and other in read-write.
if we change to a multi-pod, you need to consider carefully the distribution of the APICs, to avoid total loss of a shard. Golden rule is: do not keep 3 active APICs in the same pod, because this can happen:
The green shard is lost in case of hardware failure of all 3 APICs from pod1. To recover from this state you will need to contact TAC and BU. The procedure is called ‘ID Recovery’ to restore the whole fabric state to the latest taken configuration snapshot (if you have one).
I hope this was informative and is more clear now.
Note: I took all these details and images from the following ciscolive presentation:
I recommend you have a look on the presentation (video). You can do that on ciscolive.com and search for BRKACI-2003 in the on-demand library.
Cheers,
Sergiu
04-19-2020 05:29 AM
Thank you very much, Sergiu. Sorry for the delayed response. My day job got very busy. I understand now and I appreciate the time you took to make it clear.
Thanks, again!
04-13-2020 02:02 PM
thats correct.. only 3 controllers are active rest are standby .. recommended having a cluster off 3,5,7 .. not 2,4,6..
4 node cluster from ACI release 4.1
https://www.ciscolive.com/c/dam/r/ciscolive/emea/docs/2019/pdf/BRKACI-2003.pdf
04-13-2020 02:23 PM
Hi @omz,
In a 5-node cluster all APICs are active, meaning the shards are distributed along all 5 APICs. Standby APIC (or the correct name would be Cold Standby) is something different. More info about how Standby APIC operates can be found here: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/4-x/getting-started/b-Cisco-APIC-Getting-Started-Guide-411/b-Cisco-APIC-Getting-Started-Guide-411_chapter_0101.html#task_78E5005D04AA447EA12E526AB4F8E25E
Regards,
Sergiu
04-13-2020 02:31 PM
right ok.. thanks for that..
I was going with what it said in that doc -
Cisco recommends that you have at least 3 active APICs in a cluster, along with additional standby APICs. A cluster size of 3, 5, or 7 APICs is recommended. A cluster size of 4 or 6 APICs is not recommended.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide