Solved: Cisco DNA Center Upgrade from 1.3.3.1 to 2.1.2.0. Issues: stock at 56 percent, show 1.3.3.5 code and...

Art Astafiev · ‎09-29-2020

Hi, since version 2.1.2.0 is new I would like to share couple things from my experience upgrading from code 1.3.3.x to code 2.1.2.0. Chances are that this post will save someone some time because this information is not documented in upgrade guide.

Basically at the moment in order to do upgrade from early version to 2.1 code you as a first step need to open TAC case which will give you a link to complete software access form. To complete this form you need CLI access to your system because you will have to run some commands which will result is special ID of your system which you put on this form. Process is relatively easy and explained well on form. After that it may take few days before BU will approve your request on back end. You will know that it was approved because in System Settings > Upgrades you will start seeing message which is telling you that you can go to code 2.1.

In according to upgrade guide process is simple - two clicks process. After first click you wait 30 sec, and after second click you wait 4-6 hours even if message will tell you 2-3 hours. This is easy part -just plan accordingly.

Here are confusing part #1 - right before upgrade will start message will tell you something like "Upgrade to 1.5.208". This is fine because this is what code 2.1.2.0 is.

Confusing part #2 - during upgrade process system stay on 56 present message for few hours. Little scary - just wait longer or don't look until 6 hours will pass.

Confusion part #3 - after upgrade done you login to GUI and don't see 2.1 GUI. Then you look in System About page and see message that you are on code 1.3.3.5 - Wow. Basically this is normal for this phase of upgrade since UI package was not yet upgraded - only core components are running on 2.1 at this point. So, do not pay attention and do not panic.

Confusing part #4 - after upgrade users which login via AD authentication will work, but local admin users will not work. This is expected because some things made more secure inside. To fix this SSH to any DNAC server and run command

"rbac external_auth_fallback enable" (If you have multiple servers you only need to run command on one server. This should fix admin account login back to GUI.

Confusion part #5 - after these 6 hours you are not done yet. This was phase 1 - system core upgrade. Now you need to upgrade application packages - this will take another hour for downloads and another few hours for application upgrades. Until you do this your DNA will fail on pretty much everything because it is not supposed to work in this mode.

So, save yourself time and proceed with second phase of upgrade immediately after you done with first phase. It is interesting to mention that you don't need to do backup in the middle, because you anyway will not be able to return to this point because this is intermediate point. If you will ever need to rollback you should rollback to previous code, so make sure you did backup before started first phase of upgrade.

Steps in 2nd phase of upgrade are also very simple - go to System Settings > Upgrades and click to download all software packages. After download is completed click on Upgrade and wait for hours. In our case phase 2 took 4.5 hours.

TAC is saying that during upgrade you can SSH to server and run command "maglev package status" - during normal work this command will show you packages information and will say version and which one is deployed or not. During upgrade I had issues with stability of this command because it runs via UI API and required GUI credentials, so it kept asking me to give her my GUI admin account and password, but every 10 sec it experienced reset. So, I couldn't really successfully run it during upgrade - only after upgrade was completed.

Good luck.

Art Astafiev · ‎09-30-2020

We have on prem multi-node deployment. Not cloud. Nobody mentioned ETF. Release notes dated mid September.

https://www.cisco.com/c/en/us/td/docs/cloud-systems-management/network-automation-and-management/dna-center/2-1-2/release_notes/b_cisco_dna_center_rn_2_1_2.html

Anyway, so far I can say that they made a lot of important improvements - it worse to upgrade. After upgrade we faced one more bug, but I am not confident if we had it on 1.3.3.1 code or got it during upgrade. It only affect multi-node clusters.

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvt58303

You can confirm if you have it or not by SSH to each node and run CLI command "ip addr | grep enp" - basically it will show you IPs assigned to interfaces. Bug is in fact that same VIP IP address you will see on more than one node at the time. System has two VIPs - one is your VIP for external management connections (your access to UI) and second is VIP with internal address which used for internal application pool. Both or just one VIP may have problem. This cause connections to VIP resets periodically which may cause other issues with DNA subsystem.

I believe we got this problem after upgrade because before upgrade system was working fine and we do not remember connection resets. After upgrade we noticed that we cannot distribute image to switch and TAC found this bug as RCA for this.

For resolution TAC modified some Linux files via CLI. If you see VIP IP on more than one node just open TAC case and point to this bug.

View solution in original post

Xividar · ‎09-29-2020

I assume you're running EFT code? I believe 2.x is going to GR this week - hopefully the upgrade from previous versions will be a little smoother. If you Google Cyclops DNA, there are plenty of bugs in the wild

Art Astafiev · ‎09-30-2020

We have on prem multi-node deployment. Not cloud. Nobody mentioned ETF. Release notes dated mid September.

https://www.cisco.com/c/en/us/td/docs/cloud-systems-management/network-automation-and-management/dna-center/2-1-2/release_notes/b_cisco_dna_center_rn_2_1_2.html

Anyway, so far I can say that they made a lot of important improvements - it worse to upgrade. After upgrade we faced one more bug, but I am not confident if we had it on 1.3.3.1 code or got it during upgrade. It only affect multi-node clusters.

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvt58303

You can confirm if you have it or not by SSH to each node and run CLI command "ip addr | grep enp" - basically it will show you IPs assigned to interfaces. Bug is in fact that same VIP IP address you will see on more than one node at the time. System has two VIPs - one is your VIP for external management connections (your access to UI) and second is VIP with internal address which used for internal application pool. Both or just one VIP may have problem. This cause connections to VIP resets periodically which may cause other issues with DNA subsystem.

I believe we got this problem after upgrade because before upgrade system was working fine and we do not remember connection resets. After upgrade we noticed that we cannot distribute image to switch and TAC found this bug as RCA for this.

For resolution TAC modified some Linux files via CLI. If you see VIP IP on more than one node just open TAC case and point to this bug.

Arne Bier · ‎10-12-2020

Sounds horrendous. I don't like the sound of this upgrade - so far I have not seen any notifications and it's been a month since the 2.1 Release Notes have been out.

Is it possible to get the installer ISO and just rebuild the system from the ground up? I think it would be cleaner in cases where DNAC is just being used for assurance purposes.

JL421-Retired · ‎10-13-2020

This sounds like the upgrade process for every version of DNA-C I've stepped through, apart from the rbac command and requesting access to the update. The request for the upgrade should end shortly as 2.1.2.x goes GA.

Of note, I upgraded my lab box and yes, it flowed exactly like every other DNA-C upgrade and followed the steps mentioned in the first post. I guess if you'd never performed one before, it may be a confusing process, but this is exactly how it's been since the 1.2.1 days.

9sobey · ‎10-14-2020

Hi,

I've had some issues with the upgrade and got the below message. Do you know what i should do to either go back to the old version or try the install again?

System update status:
Version successfully installed : 1.3.0.147
Version currently processed : 1.5.208
Update phase : failed
Update details : Node update took longer to complete in 172.26.144.1 [ version 1.5.208 phase 10 ]
Progress : 68%

Updater State:
Currently processed version : 1.5.208
State : FAILED
Sub-State : INSTALLED_HOST_COMPONENTS
Details : Node update took longer to complete in 172.26.144.1 [ version 1.5.208 phase 10 ]
Source : system-updater-standby
Abort pending : Not available

JL421-Retired · ‎10-14-2020

When I've had those issues, step 1 has normally been to reboot the host that had issues and try to run the upgrade again once it is fully online. If it still fails, you'll want to open a TAC case for it.

Cisco DNA Center Upgrade from 1.3.3.1 to 2.1.2.0. Issues: stock at 56 percent, show 1.3.3.5 code and lock out admin accounts