03-06-2023 07:43 AM
Hello, we have a APIC-cluster of 5 nodes. my question if 2 fail:
hope it is clear
br + thx
Solved! Go to Solution.
03-07-2023 05:00 AM
Correct. Shard distribution doesn't change unless the cluster size changes. If you have a failure of any controller, the expectation is that it should be coming back into operation. There's also the concern about scale support. The only reason to extend beyond 3 APICs is to accomodate scale increase. If all shards were replicated to remaining controllers in a larger cluster, it could quickly exceed the supported/tested scale limits for the controller. This also saves unnecessary DB shuffling activity. If you change the cluster size, then the lost shards will be replicated from the remaining copy on the master, but the fabric scale would be decreased also.
Replacing a failed controller with a standby would help you regain full R/W operations as you would be restored to a majority across all shard replicas.
Robert
03-06-2023 10:11 AM
If you have a 5 node cluster set and 2 nodes fail, then some data shards will go into R/O mode. You'll know this if you try and make a config change to some objects in the config and it throws a cluster health fault when you attempt to submit them. Other objects which are spread across 2 or more remaining controllers will remain fully R/W. You can verify shard health by looking at 'acidiag rvread'. RV = Replica Vector.
Scalability doesn't change, you're still configured for 5 nodes, regardless how many have failed, so the scale limits will remain the same.
Robert
03-07-2023 12:29 AM
ok..if i understand you well by default even if we have 3 remaining working controllers they do not take over the shards that just have one shard active?
furthermore a standby controller would help because i can replace a failed one and get r/w again?
03-07-2023 05:00 AM
Correct. Shard distribution doesn't change unless the cluster size changes. If you have a failure of any controller, the expectation is that it should be coming back into operation. There's also the concern about scale support. The only reason to extend beyond 3 APICs is to accomodate scale increase. If all shards were replicated to remaining controllers in a larger cluster, it could quickly exceed the supported/tested scale limits for the controller. This also saves unnecessary DB shuffling activity. If you change the cluster size, then the lost shards will be replicated from the remaining copy on the master, but the fabric scale would be decreased also.
Replacing a failed controller with a standby would help you regain full R/W operations as you would be restored to a majority across all shard replicas.
Robert
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide