Re: ASR 9K - Cluster upgrade question

m-avramidis · ‎05-23-2019

I have been assigned the task of upgrading our client ASR 9K clusters (5 sites). They are currently on release 4.3.2 = an ISSU upgrade is out of the question.

https://community.cisco.com/t5/service-providers-documents/upgrading-an-nv-edge-cluster/ta-p/3161698

I have choosed option 1 in the guide above, after copying the .tar package to the harddisk: which command do I run?

Do I run the same commands as I would if it had been a stand-alone node?

admin install add tar harddisk: "tar image name" synch

admin install activate disk0:"release version"

and then the comitt command.

or is other commands needed since it is a cluster setup?

Reason for asking is that our client is a huge broadcasting company and nothing can go wrong unfortunately I have not been able to test the upgrade on a cluster setup reason being is that they have on 9K for lab usuage.

Upgrade plan:

4.3.2 > 6.1.4 > 6.5.3. Have tested this path, both upgrade and downgrade. Keeps all of the configuration that our client has, the only thing that is needed to add again after downgrading to v4.3.2 is the "new" Cisco root cert.

Any tips and tricks and/or experiense from a cluster upgrade is appreciated

Giuseppe Larosa · ‎05-24-2019

Hello M-avramdis,

I have performed a Cisco ASK 9k cluster upgrade with Cisco TAC guidance going from IOS XR 5.1.3 to XR 5.3.1 some years ago.

I took 70 minutes to perform the upgrade including the firmware upgrade of the linecards.

If I remember correctly we used option 1 that means treating the node a single node with two route processors

Notice that because both chassis will reload at the same time you a big network outage.

>> This means that we treat the cluster as a single logical node and the upgrade consists of applying the new image onto both racks at the same time and then reloading. This means that the whole cluster will take a hit and be upgraded at the same time.

In your case with a double upgrade you need to schedule a 3 hours time maintenance window.

It is important to have someone on the site to be able to see and collect the console output during the upgrade.

From the document you have provided the second method using the EEM script may be more a fit for your case as it performs the upgrade each chassis separately with reduced network impact.

However, I have no direct experience of this.

Hope to help

Giuseppe

m-avramidis · ‎05-24-2019

Thanks Giuseppe, I would have go ahead with the script upgrade if I had had the opportuntiy to test the script on a cluster. We have asked for a 6,5 hrs service window (will route traffic through another site during the service window). If the first upgrade don`t work then we will do a Turboboot re-image and then build the cluster again. Have tested this approach, and it took about 1,5 hrs to complete (major issue was licensing...but will download the licenses from Cisco before the upgrade next week).

Will today open a TAC case, TAC will be on the phone during the upgrade.

What scares me a bit is that I have not run into any problems and/or issues during the test, everything has gone without any problems (minor problem with the Turboboot process, changed the rommon config - not using the step-by-step guide seen on the Cisco website). Upgrade from 4.3.2 > 6.1.4 took 47 min to complete, from 6.1.4 > 6.5.3 took 45 min to complete. Turboboot took 1,5 hr to complete (including adding all of the packages, multiple reboots and adding of the licenses and configuration).

Giuseppe Larosa · ‎05-24-2019

Hello M-avramidis,

>> I would have go ahead with the script upgrade if I had had the opportuntiy to test the script on a cluster.

I agree on this

>> We have asked for a 6,5 hrs service window (will route traffic through another site during the service window). If the first upgrade don`t work then we will do a Turboboot re-image and then build the cluster again. Have tested this approach, and it took about 1,5 hrs to complete (major issue was licensing...but will download the licenses from Cisco before the upgrade next week).

Again this is reasonable.

>> Will today open a TAC case, TAC will be on the phone during the upgrade.

This is a wise move.

Hope to help

Giuseppe

m-avramidis · ‎05-24-2019

Giuseppe, will update this post with info after the upgrade.

m-avramidis · ‎06-17-2019

The upgrade went fine but... Multicast streams did not work. Got a very good (the best that I have had) TAC engineer on call, after about 30 minutes of troubleshooting he informed us that we used an unsupported sw version. The last supported version for NV cluster is 5.3.X release. He gave us 2 options:

1. Downgrade

2. Break the cluster

Since none of these options was feasable, we changed parts of the network to use unicast instead of multicast (multicast problem: streams came in as it should but never left the rack) - this workaround was OK by our client for 12 hrs. I went back again to our client and downgraded the cluster to version 4.3.2 again.

TAC is investigating what went wrong, since that the information that I provided in the TAC case (when I created it) clearly stated (1) which setup our client use and (3) the upgrade path we choosed.

Giuseppe Larosa · ‎06-17-2019

Hello m-avramidis,

I'm sorry that the upgrade was not successful.

>> The last supported version for NV cluster is 5.3.X release.

If this is true, this information might have been given to you before the upgrade attempt.

>> TAC is investigating what went wrong, since that the information that I provided in the TAC case (when I created it) clearly stated (1) which setup our client use and (3) the upgrade path we choosed.

I agree that you have done all the possible on your side. If your description of the client network setup included the need for multicast routing this should have triggered an early warning on the proposed upgrade path.

Hope to help

Giuseppe