cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1909
Views
0
Helpful
8
Replies

DNA Center system upgrade fails at 60%

Helena Cornic
Spotlight
Spotlight

Hi,

 

Upgrading a DNA Center  in 2.1.2.6 version from system version 1.5.279 to 1.5.288  stops at 60% in phase 7:

 

maglev system_updater update_info

 

 System update status:
 Version successfully installed : 1.5.279
 Version currently processed : 1.5.288
 Update phase : failed
 Update details : Updating node x.x.x.x failed
 Progress : 60%

 

Updater State:
 Currently processed version : 1.5.288
 State : FAILED
 Sub-State : INSTALLED_HOST_COMPONENTS
 Source : system-updater
Abort pending : False

 

We also tried to delete previous downloaded packages as recommended in the DNA Center upgrade guide with:

 

for pkg in $(maglev package status -o json | jq -r '.[] | select(.available!="-") | [ .name,.available | tostring ] | join (":")'); do maglev catalog package delete $pkg 2>/dev/null; done

but the result is the same.

 

This is the error in the system updater log:

 

{"asctime": "2022-04-25 17:15:18,039", "timeMillis": 1650906918.03956, "filename": "node_updater.py", "funcName": "_wait_for_node_update_completion_helper", "levelname": "ERROR", "levelno": 40, "lineno": 804, "module": "node_updater", "msecs": 39.56007957458496, "message": "Got completion notification for phase (7): Node x.x.x.x version 1.5.288 state FAILED node-phase 7 phase 7", "name": "node-updater", "pathname": "/opt/maglev/lib/python3.5/site-packages/system_updater/node_updater.py", "process": 136, "processName": "MainProcess", "relativeCreated": 494022.76253700256, "thread": 139664920917760, "threadName": "Thread-13", "level": "ERROR"}
{"asctime": "2022-04-25 17:15:18,040", "timeMillis": 1650906918.0401087, "filename": "system_update_orchestrator.py", "funcName": "process_system_updater_requests", "levelname": "ERROR", "levelno": 40, "lineno": 428, "module": "system_update_orchestrator", "msecs": 40.108680725097656, "message": "\nStatus: 1/Updating node x.x.x.x failed\n", "name": "system-updater", "pathname": "/opt/maglev/lib/python3.5/site-packages/system_updater/system_update_orchestrator.py", "process": 136, "processName": "MainProcess", "relativeCreated": 494023.3111381531, "thread": 139666200835840, "threadName": "MainThread", "level": "ERROR"}
{"asctime": "2022-04-25 17:15:18,047", "timeMillis": 1650906918.0473998, "filename": "k8s_proxy.py", "funcName": "inner_wrapper", "levelname": "WARNING", "levelno": 30, "lineno": 110, "module": "k8s_proxy", "msecs": 47.39975929260254, "message": "Exception while getting kubernetes objects (<pykube.http.HTTPClient object at 0x7f068a0ca128>): ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))", "name": "root", "pathname": "/opt/maglev/lib/python3.5/site-packages/system_updater/common/k8s_proxy.py", "process": 136, "processName": "MainProcess", "relativeCreated": 494030.6022167206, "thread": 139664409224960, "threadName": "Thread-17", "level": "WARN"}

 

I know that ideally I should open a TAC case but this is an old Lab environment and  unfortunately I don't have TAC support.

 

Any clues?

 

Thank you,

Helena

1 Accepted Solution

Accepted Solutions

Helena Cornic
Spotlight
Spotlight

Hi,

 

update:

I ran a maglev-config update wizard just to add a static route, and after that I tried the upgrade again: this time it worked fine:

 

System update status:
Version successfully installed : 1.5.288

Updater State:
Currently processed version : NONE
State : IDLE
Sub-State : NONE
Details : The system has been successfully updated
Source : system-updater
Abort pending : False

 

I suppose than restarting the nodes helped.

Let's see how it goes now all the path to version 2.3.3..

 

Cheers.

 

View solution in original post

8 Replies 8

Hi

 kubectl get nodes

 

All nodes are Ready?

Dan Rowe
Cisco Employee
Cisco Employee

What version is your Cisco DNA center currently running? Depending on what code it is on currently, you may be able to run an AURA report which basically performs various health checks for your Cisco DNA Center. You may be able to identify what is causing the upgrade failure by checking the results of the AURA report.

 

Here is a link to the AURA report which provides step-by-step instructions of how to run it:

https://www.cisco.com/c/en/us/support/docs/cloud-systems-management/dna-center/215840-cisco-dna-center-aura-audit-and-upgrad.html

 

If your Cisco DNA center is running on code older than 2.x, you may be better off with performing a re-image on the latest 2.2.3.5 Cisco DNA Center. This can potentially save you several hours by re-imaging instead of upgrading as upgrading from pre-2.x code to 2.2.3.x will require multi-step upgrade.

Helena Cornic
Spotlight
Spotlight

Hi,

 

For the node, it seems to be ok: 

 

kubectl get nodes

 

NAME            STATUS   ROLES    AGE    VERSION

192.168.x.x   Ready    master   346d   v1.15.3-cisco

 

And for the version, it's 2.1.2.6, and I already ran the report but it doesn't help very much. The only warnings about the upgrade are:

 

Warning:Downloading a medium test image of size 150MB was aborted as it took > 30sec. Please in
crease the bandwidth to the cloud as this can slow down a DNAC upgrade. Over 10GB of software is
downloaded during an upgrade.
Warning:Downloading a large test image of size 650MB was aborted as it took > 130sec. Please in
crease the bandwidth to the cloud as this can slow down a DNAC upgrade. Over 10GB of software is
downloaded during an upgrade.

 

but the update doesn't fail in the downloading phase.

 

Re-imaging the system can be an option but, don't we need to contact Cisco TAC for that?

 

Thank you very much for you answers.

 

Helena

Run this command to see if you dont have any expired certificate:

sudo maglev-config certs info

 

Recently I had to upgrade 10 cluster with 3 nodes each due Log4J vulnerability. Same scenario 2.1.2.6 to 2.1.2.8

It turned out that only  2 cluster upgrade by itself and 8 with TAC and BU. 

Dan Rowe
Cisco Employee
Cisco Employee

No, it is not required to work with TAC to re-image the Cisco DNA Center assuming you have access to download the latest ISO from software.cisco.com. You can follow these steps to re-image the appliance:

 

https://www.cisco.com/c/en/us/td/docs/cloud-systems-management/network-automation-and-management/dna-center/2-2-3/install_guide/2ndgen/b_cisco_dna_center_install_guide_2_2_3_2ndGen/m_prepare_the_appliance_for_configuration_2_2_3_2ndgen.html#task_exk_...

Helena Cornic
Spotlight
Spotlight

Certificates are also ok:

 

$ sudo maglev-config certs info
--------------------------------------------------------------------------------
certificate start date end date
--------------------------------------------------------------------------------
credentialmanager.pem Mar 26 02:34:08 2022 GMT Mar 26 02:34:08 2023 GMT
kong.pem Mar 26 02:34:08 2022 GMT Mar 26 02:34:08 2023 GMT
kube-worker-1.pem Mar 26 02:34:08 2022 GMT Mar 26 02:34:08 2023 GMT
maglev-registry.pem Mar 26 02:34:08 2022 GMT Mar 26 02:34:08 2023 GMT
apiserver.crt May 17 09:45:50 2021 GMT Mar 26 02:30:04 2023 GMT
apiserver-kubelet-client.crt May 17 09:45:50 2021 GMT Mar 26 02:30:05 2023 GMT
front-proxy-ca.crt May 17 10:53:00 2021 GMT May 15 10:53:00 2031 GMT
front-proxy-client.crt May 17 10:53:00 2021 GMT Mar 26 02:30:05 2023 GMT
admin.conf May 17 09:45:50 2021 GMT Mar 26 02:30:06 2023 GMT
scheduler.conf May 17 09:45:50 2021 GMT Mar 26 02:30:07 2023 GMT
controller-manager.conf May 17 09:45:50 2021 GMT Mar 26 02:30:07 2023 GMT
--------------------------------------------------------------------------------

 

I'll start considering  the  re-imaging process, and try to get Cisco support in parallel.

 

Thanks,

Helena

Yeah, also think is the right decision. Good Luck!

Helena Cornic
Spotlight
Spotlight

Hi,

 

update:

I ran a maglev-config update wizard just to add a static route, and after that I tried the upgrade again: this time it worked fine:

 

System update status:
Version successfully installed : 1.5.288

Updater State:
Currently processed version : NONE
State : IDLE
Sub-State : NONE
Details : The system has been successfully updated
Source : system-updater
Abort pending : False

 

I suppose than restarting the nodes helped.

Let's see how it goes now all the path to version 2.3.3..

 

Cheers.