10-07-2019 04:20 AM
Hi,
According to the multi-pod white paper, it is recommended to have 4 APICs in a two site multi-pod topology, where each site has 2 APICs and one of them is a standby APIC. It also says that "It is important to reiterate the point that the standby node should be activated only when Pod1 is affected by a major downtime event" What are the real implications? My customer performs disaster recovery tests at the main datacenter that would require bringing the standby APIC on the secondary DC to active in order to get the write mode and apply network changes during this status.
What are the implications (drawbacks) of promoting the standby APIC? What would be the procedure to bring the 4th APIC back to standby mode?
Thanks.
Solved! Go to Solution.
10-07-2019 06:29 AM
Hi @Antonio Macia,
Its good to see customers executing real failure scenarios! The repeated warnings around activating the standby APIC revolve around corrupting the cluster or the shards (the database "slices" that constitute the configuration of your fabric). This can happen if you promote your standby APIC and your failure scenario really only involved taking down the IPN (and not the actual data center with the two APICs) and the IPN is restored and even worse if you have made changes to one side or the other in this now "split brain" situation.
What you client wants to do is certainly possible but must be done with care.
The only drawback of promoting your standby APIC is if you don't follow the process exactly. Its also a time consuming process and you need to make sure you always have CIMC access to all of your APICs or be ready to head into the DCs :D (you may be there already for the failure testing but its a good idea to make sure all of that is working nonetheless). Also, make sure you have a backup just in case.
At this point you can configure original DC1 APIC2 as a standby so its gets a copy of the data and basically repeat the process above to promote it back to original APIC2, which wipes new APIC2 and then you configure the original Standby as standby again.
Here is a good write up by Valter Popeskic:
https://howdoesinternetwork.com/2019/aci-multipod-enable-standby-apic
10-07-2019 06:29 AM
Hi @Antonio Macia,
Its good to see customers executing real failure scenarios! The repeated warnings around activating the standby APIC revolve around corrupting the cluster or the shards (the database "slices" that constitute the configuration of your fabric). This can happen if you promote your standby APIC and your failure scenario really only involved taking down the IPN (and not the actual data center with the two APICs) and the IPN is restored and even worse if you have made changes to one side or the other in this now "split brain" situation.
What you client wants to do is certainly possible but must be done with care.
The only drawback of promoting your standby APIC is if you don't follow the process exactly. Its also a time consuming process and you need to make sure you always have CIMC access to all of your APICs or be ready to head into the DCs :D (you may be there already for the failure testing but its a good idea to make sure all of that is working nonetheless). Also, make sure you have a backup just in case.
At this point you can configure original DC1 APIC2 as a standby so its gets a copy of the data and basically repeat the process above to promote it back to original APIC2, which wipes new APIC2 and then you configure the original Standby as standby again.
Here is a good write up by Valter Popeskic:
https://howdoesinternetwork.com/2019/aci-multipod-enable-standby-apic
10-07-2019 08:10 AM
Thanks your reply Claudia. Valter's post is very clarifying. I can now understand the procedure.
Regards.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide