ACI upgrade involves APIC software update and switches update. Switch upgrade is usually very straight forward, however APIC upgrade may evolve to some cluster issues. Here are a few pre-check list we usually recommend to customer to prepare before an upgrade get started.
Preparations for APICs Before Upgrade:
0. Clear all the faults and Overlapped VLAN Blocks
The faults of ACI fabric stand that there are invalid or conflict policies or even disconnected interfaces etc, please understand the trigger and clear them before kick in upgrade. Please be aware, those conflict policies such as "encap already been used" or "Routed port is in L2 mode" would result of unexpected outage because ACI switch upgrade would fetch all the policies from APIC from scratch and follows "first come first serve" behavior. As a result, the unexpected policies would very possibly take over those expected polices.
1. Make sure the upgrade path is supported.There are data-conversion involved during the upgrade, follow a supported upgrade path would make sure the database are converted properly.
***very important:Reading the release notes of target APIC version.
2.Backup APIC's configuration to an external server.By any chance if we have to re-import the configuration, this would be the only data we can restore the same configuration. If the encryption of backup is enabled, make sure the encryption key is saved, otherwise, all the passwords include admin's password would not be imported promptly, then we will have to reset the admin password from cli (the admin login to or via USB.)
3.Make sure the CIMC of APICs are accessible. This is to avoid two risks:
a. CIMC 1.5(4e) has a memory leak defect which would lead the impacted APIC (usually APIC2 and above) won't kick off the upgrade. It would also lead APIC1's process crash post the upgrade. You can detect if the CIMC has reached the bad state if the CIMC become not reachable either from GUI/SSH,it is very important to restore that by reset CIMC through disconnect server's power cord, wait for 3 minutes and connect back. Upgrade the CIMC before the APIC upgrade is highly
b.Without CIMC access, we will not be able to access the APIC console remotely if that something went wrong, get all of this access ready before the upgrade is very critical.
4. Make Sure appliance element process was not locked by IPMI defect
We saw a few cases that a CentOS defect (about IPMI) would lock the AE thread. AE (appliance element) is in charge of calling the upgrade utility (installer.py), if AE is locked, the upgrade would not kick in. We can confirm whether AE is impacted by IPMI by CLI:
If there is no such hit from the IPMI output or the last IPMI query to chassis was longer than 10 seconds ago in comparison with the system current time (get by date), you may want to reboot the APIC OS before triggerring the upgrade, please do not reboot two or more apics at the same time.
5. Make sure NTP are reachable
This will avoid hitting a know issue which may result apic2-3 stuck in waiting. Details can be found in the troubleshooting cast study below.
6.Review behavior changes of new version and evaluate the potential impact. One example is that if router control enforcement (for l3 out) was turned on for OSPF before ACI version 2.0 (it was there for BGP and was not grey out for OSPF), it would start working as soon as leaf get upgraded to 2.0, so all OSPF routes are filtered out by L3out which would cause outage.
7. Stage the Upgrade In LAB before apply the change in production. It will always be good to get familiar with the newer version by upgrading the lab, have at least a minimum test of the applications.
Preparations for Switches Before Upgrade:
1. Place VPC/redundant pairs into different maintenance group.
APIC won't allow vpc pairs upgrade at the same time from a certain version and beyond, still it is best practice to put vpc pairs into different maintenance group, for non-vpc pairs of switches which backup each other like border leaf switches, they need be put into different groups. So that only one of member is rebooted while the other remain online.
In case the upgrade failed and troubleshooting is required, always start with APIC1, if APIC1 did not finish upgrade, please do not touch APIC2. If APIC1 is done but APIC2 did not complete, please do not touch APIC3, violate this rule could lead the cluster database broken and cluster rebuilt.
1. APIC2 or Above stuck at 75% even APIC1 has completed.
This problem could happen because the APIC1's upgraded version information is not propagated to APIC2 or above. Please be aware, svc_ifc_appliance_director is in charge of the version sync between APICs and store them into a framework so that upgrade utility (and other process) could read.
First, please make sure APIC1 could ping rest of the APIC, this will determine whether we need troubleshoot from leaf switch or continue from APIC itself. If APIC1 can not ping APIC2, you may want to call TAC to troubleshoot the switch. If APIC1 could ping APIC2, then move to second step.
Second, since APICs can talk to each other, which means APIC1's version info should have been replicated to peer but somehow was not accepted, the version info is identified by the followed timestamp. We can run the cli below to confirm the version timestamp of APIC1 from APIC1 self and APIC2 which is waiting at 75% before complete.
As showed above on APIC2, APIC1's (old) version 2.0(1m) is even later than APIC1's new version 2.0(2f) timestamp, this prevents APIC2 to accpeted APIC1's newer version propagation, so the installer on APIC2 think that APIC1 did not complete upgrade yet. Instead of moving to data-conversion stage, APIC2 will keep waiting for APIC1. There is a workaround which must be run from APIC1 and only when APIC1 has completed the upgrade successfully and booted up into new version, never run this from any APICs if they are waiting at 75% , this would totally mess up. Consider of the risk, i would suggest you call TAC instead of doing that by yourself.
I want to make the password 3 characters in Nexus 3K.Is there a way?? PS_OA_NX_BB_1(config)# username admin password ndsWARNING: Minimum recommended length of 8 characters.WARNING: Password should contain characters from at least three of the followi...
Hi, What is the recommended process to migrate an ASAv from one host to another but using local storage? Do I need to disable failover or even shutdown the VM? Is there any different recommendation based if the ASAv is the p...
This is with regards to “IBM SAN192C8978-E04(4 Module)SAN Director” My requirement is “2 ports of FCIP connectivity for SAN replication to distance location.”I see that the Supervisor module has 2 GE ports, can I order 2 Qty of GigE Copper SFP and us...
We have two data centers connected with only OTV link. All the network traffic flowing (extended vlans) are passing through OTV. We need to make L3 routing for few subnets which are configured on aggregation switches on both sites. How can we achieve this...
ello All, We have scenario where Multiple EPGs are configured under same Bridge Domain(BD), would like to understand IP assignment for node ...For eg : BD : bd_shared_service , BD Subnet : 10.10.10.0/24EPG-1 : epg_ss_10EP...