12-10-2017 03:31 AM - edited 03-01-2019 05:24 AM
Recently I've found post about standby APIC in Multipod topology. Technology is based on using the same Node ID (2 for example) for fourth APIC at second pod, and keeping this node powered down and separated from fabric. Then during disaster recovery procedure decomission is done for former APIC with Node ID 2, and commision for the new one under the same Node ID.
Also I've found that starting with 2.x version there is true standby role for APIC
But according to link above this role is recommended to be used in the same pod where replaced node is located. Also I've seen in BRKACI-2603 (page 18) that this can be used in Multipod deployment.
Is it possible to use true stadby role in Multipod toplogy? Will replace procedure use right pod number if replaced APIC from Pod 1, but standby APIC is located in Pod 2?
Also what Pod ID really matters? Can't understand its purpose since all APICs are using the same TEP pool from Pod 1.
Thanks in advance!
Solved! Go to Solution.
12-16-2017 04:17 AM
After experimenting in the field I can confirm that "warm" standby works in Multipod topology.
Topology was: Pod1 - apic1 and apic2, Pod2 - apic3 and apic21 (standby).
We've done two tests - first was replacing "alive" controller apic2 from Pod1 by standby controller apic21 from Pod2. And reverting back. Successful.
Second test - replacing "dead" controller apic2 from Pod1 by standby controller apic21 from Pod2. And reverting back. Also successful.
It is worth noting that standby controller after replace procedure took Node ID from replaced APIC, but kept its name and Pod Id (in our case it took Node ID 2, but kept name apic21 and Pod Id 2).
All test were done from apic3 GUI and from System -> Controllers -> apic3 -> Cluster as Seen by Node.
Initial state:
"Node ID" "Node Name" "Health state"
1 apic1 fully-fit
2 apic2 fully-fit
3 apic3 fully-fit
Whole procedure:
1) Chose apic2, context menu "Replace", picked standby apic21 from drop-down list, left checked "Retain OOB address".
After submitting, cluster was processing changes for up to 10 minutes. apic2 controller was powered down during this (but not cleared/erased).
apic21 disappeared from standby controllers and appeared as "In-Service" among other controllers, also it kept its OOB management address.
State after step1:
"Node ID" "Node Name" "Health state"
1 apic1 fully-fit
2 apic21 fully-fit
3 apic3 fully-fit
2) To understand fabric behaviour we powered on apic2 again. We haven't noticed error messages in fabric, looked like fabric didn't want to negotiate with excluded apic2 at all and kept it isolated.
3) To revert everything back we've done "Decommission" apic21, and in a minute after this "Reset" (both commands from context menu).
4) We've manually powered down apic21.
5) Then "Commission" in the same row as in step3. And apic2 was successfully joined back to the fabric.
State after step5:
"Node ID" "Node Name" "Health state"
1 apic1 fully-fit
2 apic2 fully-fit
3 apic3 fully-fit
6) To bring back apic21 to standby state in fabric we've done "acidiag touch clean", "aciadiag touch setup" in its CLI and rebooted. And went through setup procedure on apic21 again to setup it as standby.
7) apic21 appeared as standby controller in fabric GUI and we "Approved" (or "Accepted", don't remember) it.
Sequence for both tests ("alive","dead") was the same, to simulate "dead" state we just disabled interfaces on leafs toward apic2.
Also we've noticed bug on 3.0.2(k). If you cancel replace procedure after viewing standby APIC from drop-down menu, you will find this drop-down menu empty next time, which leads to impossibility to do replace procedure. Workaround - logoff from the GUI and login again.
It is said in documentation that standby apics are upgrated automatically after upgrading ordinary ones. Didn't check this as we worked on the latest version.
Also haven't tried to separate pods and got rid of minority state, but I think procedure at minority pod (where apic3 and standby apic21 are located) will be like this:
1) Login to apic3
2) In apic3 GUI use "Replace" to replace apic2 with standby apic21.
3) You've done.
Maybe before step2 there will be needed apic2 "Decommission". Not sure.
If somebody has already done such procedure in Multipod topology please share your experience.
12-11-2017 09:18 AM
Shutikov,
I think there may be some minor confusion between the two documentation you have presented. In the first document, it is referring to a stretched fabric design (not multipod) in which all switch nodes exist within a single POD. The stretched fabric design only requires transit leaves, which allow you to extend the physical reach of the fabric while still being within a single pod. This deign existed before Multipod feature existed.
The multipod feature set and design requires the use of an IPN (Inter pod Network) in order to join the two distinct PODs to each other, and allow them to be configured within the realm of a single ACI fabric. For the multipod design and features set, the POD ID is used to denote the Switch node placement, as the switch nodes have a unique TEP pool that would be separate from POD 1 (APICs use POD 1, switches would use POD 2/3 etc depending on their placement and the design being followed).
Finally for the standby controller, the documentation mentioned that it is recommended to keep the standby in the same POD as the active that it would be replacing. That is specifically referring to the POD ID that would be associated with the switch nodes (TEP POOL), and which is typically located in a separate location.
This doc refers more to multipod design and considerations:
-Gabriel
12-11-2017 10:05 AM - edited 12-11-2017 10:15 AM
Gabriel, thanks for reply.
As I can see from provided whitepaper there is following text (below figure
23), I meant it in my first post.
The specific procedure required to bring up the standby node and have it
joining the cluster is identical to what described for the ACI stretched
Fabric design option at the link below (same applies to the procedure to be
followed to eventually recover Pod1):
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/kb/b_kb-aci-stretched-fabric.html#concept_524263C54D8749F2AD248FAEBA7DAD78
<>
.
So, Multipod whitepaper is referencing stretched fabric design in the field
of standby APIC placement and use. But I would say this is "cold" standby
technology.
My question was about "warm" standby (HA), which appeared in 2.x version.
Lets imagine Pod 1 and Pod 2. There are APIC1 and APIC2 at Pod 1. And APIC3
and "warm" standby APIC at Pod 2.
After Pod 1 has gone, I have only APIC3 and "warm" standby APIC at Pod 2.
To get rid of minority, can I assign Node ID 2 to "warm" standby APIC at
Pod 2 after decommissioning former APIC2 (which was in Pod 1) from APIC3?
Which Pod assignment will get this "new" APIC2?
Will it work at all? Because it is violating recommendation about standby
APIC placement in the same Pod with the APIC that it would replace.
Also about Pod id, I understand that switches in different Pods use this
number to choose TEP address from right pool.
As we can see in "Cisco Application Centric Infrastructure Multipod
Configuration White Paper" they assign Pod id 2 to APICs in Pod 2.
https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-739714.html#_Toc495712863
But what is purpose to assign different Pod id to APIC? Because no matter
what Pod id is assigned APICs use TEP pool from Pod 1.
We can even create Multipod with all of the APICs located only in Pod 1. And Multipod will work. Why just not assign all of the APICs with Pod id = 1 no matter which Pod they are placed in?
12-11-2017 10:49 AM
Gabriel, thanks for reply.
12-16-2017 04:17 AM
After experimenting in the field I can confirm that "warm" standby works in Multipod topology.
Topology was: Pod1 - apic1 and apic2, Pod2 - apic3 and apic21 (standby).
We've done two tests - first was replacing "alive" controller apic2 from Pod1 by standby controller apic21 from Pod2. And reverting back. Successful.
Second test - replacing "dead" controller apic2 from Pod1 by standby controller apic21 from Pod2. And reverting back. Also successful.
It is worth noting that standby controller after replace procedure took Node ID from replaced APIC, but kept its name and Pod Id (in our case it took Node ID 2, but kept name apic21 and Pod Id 2).
All test were done from apic3 GUI and from System -> Controllers -> apic3 -> Cluster as Seen by Node.
Initial state:
"Node ID" "Node Name" "Health state"
1 apic1 fully-fit
2 apic2 fully-fit
3 apic3 fully-fit
Whole procedure:
1) Chose apic2, context menu "Replace", picked standby apic21 from drop-down list, left checked "Retain OOB address".
After submitting, cluster was processing changes for up to 10 minutes. apic2 controller was powered down during this (but not cleared/erased).
apic21 disappeared from standby controllers and appeared as "In-Service" among other controllers, also it kept its OOB management address.
State after step1:
"Node ID" "Node Name" "Health state"
1 apic1 fully-fit
2 apic21 fully-fit
3 apic3 fully-fit
2) To understand fabric behaviour we powered on apic2 again. We haven't noticed error messages in fabric, looked like fabric didn't want to negotiate with excluded apic2 at all and kept it isolated.
3) To revert everything back we've done "Decommission" apic21, and in a minute after this "Reset" (both commands from context menu).
4) We've manually powered down apic21.
5) Then "Commission" in the same row as in step3. And apic2 was successfully joined back to the fabric.
State after step5:
"Node ID" "Node Name" "Health state"
1 apic1 fully-fit
2 apic2 fully-fit
3 apic3 fully-fit
6) To bring back apic21 to standby state in fabric we've done "acidiag touch clean", "aciadiag touch setup" in its CLI and rebooted. And went through setup procedure on apic21 again to setup it as standby.
7) apic21 appeared as standby controller in fabric GUI and we "Approved" (or "Accepted", don't remember) it.
Sequence for both tests ("alive","dead") was the same, to simulate "dead" state we just disabled interfaces on leafs toward apic2.
Also we've noticed bug on 3.0.2(k). If you cancel replace procedure after viewing standby APIC from drop-down menu, you will find this drop-down menu empty next time, which leads to impossibility to do replace procedure. Workaround - logoff from the GUI and login again.
It is said in documentation that standby apics are upgrated automatically after upgrading ordinary ones. Didn't check this as we worked on the latest version.
Also haven't tried to separate pods and got rid of minority state, but I think procedure at minority pod (where apic3 and standby apic21 are located) will be like this:
1) Login to apic3
2) In apic3 GUI use "Replace" to replace apic2 with standby apic21.
3) You've done.
Maybe before step2 there will be needed apic2 "Decommission". Not sure.
If somebody has already done such procedure in Multipod topology please share your experience.
04-08-2019 07:00 AM - edited 04-08-2019 07:00 AM
Hi All,
Regarding the question "Maybe before step2 there will be needed apic2 "Decommission". Not sure"
It's a yes, I needed to do decommission in order to successfully replace APIC.
Here are my notes from the action:
https://howdoesinternetwork.com/2019/aci-multipod-enable-standby-apic
Hope this helps..
cheers!!
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide