cancel
Showing results for 
Search instead for 
Did you mean: 
cancel

Migrating from RSP440 to RSP880 on ASR9000

13516
Views
10
Helpful
42
Comments

Migrating from RSP440 to RSP880 on ASR9000

 

1) Prerequisites:

 

Console access to the router and Router must be running release 5.3.2 or (5.3.1 with engineering SMU)

System must not have 800G (tomahawk) line cards, (see caveat (a) in list below)

RSP1 is the active RSP440 (see caveat(b) in list below)

This process only works when upgrading RSP-440-TR/SE to RSP-880. This process does NOT apply to A9K-RSP-4G or A9K-RSP-8G(AKA RSP2), an upgrade from RSP2 to RSP880 is not graceful.

Before the upgrade process starts, and prior to removing any RSP from the chassis, make sure that “show redundancy” shows NSR and standby card is ready.

Make sure any none 5.3.x software is removed from the system (ie install remove *4.3.4*)

Enable console logging on the router and on the terminal application (optional)

 

2) Caveats:

 

The system must NOT have Tomahawk (400G or 800G) Line cards in the chassis prior to attempting this as the EOBC link between Tomahawk and newly inserted RSP880 will be down which will interfere with the failover from RSP440 to RSP880. If there are Tomahawk cards in the chassis, they need to be removed or powered down for the upgrade to be successful.  ‘admin-config #hw-module power disable location 0/1/cpu0’


While doing RSP440 to RSP880 upgrade, Active RSP440 must be in slot 1, else the user will observe online diag fabric punt failure to LC on the newly standby RSP880.

 

Remove Trident Linecards from the system as they are incompatible with the RSP880.

 

3) Upgrade process:

 

 

 RSP1 is the active RSP440 running 5.3.2 or 5.3.1 + SMU

If RSP0 slot is active and RSP1 is standby then first OIR/Remove RSP0 slot card (or do redundancy switchover from RSP0 to RSP1) to make RSP1 as active and then insert RSP880 in slot0.

 

Issue show redundancy and make sure NSR and Standby card is ready

 

RP/0/RSP0/CPU0:vkg7RO#show redundancy summary

Wed Jun 17 11:18:51.391 PST

  Active/Primary   Standby/Backup

  --------------   --------------

  0/RSP0/CPU0(A)   0/RSP1/CPU0(S) (Node Ready, NSR: Ready)

  0/RSP0/CPU0(P)   0/RSP1/CPU0(B) (Proc Group Ready, NSR: Ready)

RP/0/RSP0/CPU0:vkg7RO#       

 

RP/0/RSP0/CPU0:vkg7RO#show redundancy

Wed Jun 17 11:20:19.086 PST

Redundancy information for node 0/RSP0/CPU0:

==========================================

Node 0/RSP0/CPU0 is in ACTIVE role

Node Redundancy Partner (0/RSP1/CPU0) is in STANDBY role

Standby node in 0/RSP1/CPU0 is ready

Standby node in 0/RSP1/CPU0 is NSR-ready

Node 0/RSP0/CPU0 is in process group PRIMARY role

Process Redundancy Partner (0/RSP1/CPU0) is in BACKUP role

Backup node in 0/RSP1/CPU0 is ready

Backup node in 0/RSP1/CPU0 is NSR-ready

 

Group            Primary         Backup          Status

---------        ---------       ---------       ---------

v6-routing       0/RSP0/CPU0     0/RSP1/CPU0     Ready

mcast-routing    0/RSP0/CPU0     0/RSP1/CPU0     Ready

netmgmt          0/RSP0/CPU0     0/RSP1/CPU0     Ready

v4-routing       0/RSP0/CPU0     0/RSP1/CPU0     Ready

central-services 0/RSP0/CPU0     0/RSP1/CPU0     Ready

dlrsc            0/RSP0/CPU0     0/RSP1/CPU0     Ready

dsc              0/RSP0/CPU0     0/RSP1/CPU0     Ready

 

Reload and boot info

----------------------

A9K-RSP440-SE reloaded Fri Jun 12 14:40:22 2015: 4 days, 20 hours, 39 minutes ago

Active node booted Fri Jun 12 20:52:24 2015: 4 days, 14 hours, 27 minutes ago

Last switch-over Mon Jun 15 10:43:34 2015: 2 days, 36 minutes ago

Standby node boot Mon Jun 15 10:44:53 2015: 2 days, 35 minutes ago

Standby node last went not ready Mon Jun 15 10:46:27 2015: 2 days, 33 minutes ago

Standby node last went ready Mon Jun 15 10:47:27 2015: 2 days, 32 minutes ago

There have been 3 switch-overs since reload

 

Active node reload "Cause: Initiating switch-over."

Standby node reload "Cause: Initiating switch-over."

 

RP/0/RSP0/CPU0:vkg7RO#                                                       

 

 

3)     Remove the standby RSP440 (RSP0)

 

4)      Insert RSP880 (A) in slot 0.

 

5)      Issue Ctrl-c to break into ROMMON of inserted RSP880

                (If this is missed, the RSP880 will go into a boot cycle loop, and there will be another chance to break into rommon on the next bootup, no side effects apart from syslog msgs on the active RSP).

 

 

NOTE: if your running 5.3.x and greater releases, you may get a menu to choose various boot options, you will need to select "boot IOS XR in 32-bit mode", if the requirement is to stat on 32 bit XR. 

 

6)      From ROMMON prompt on RSP880 (A), set ROMMON variable to force 1GE Mode for Peer RSP communication (RSP_LINK_1G=1), and check config register is set correctly in ROMMON

               rommon B1> RSP_LINK_1G=1

               rommon B1> sync

 

7)      Reset RSP880 (A) card, it will boot up to standby state which will synchronize configuration from Active RSP440 in Slot 1

               rommon B1> reset -h

 

8)      Verify the RSPs have reached full synchronization with “show redundancy” command and the groups are in “ready” state.

               show redundancy

 

9)      Verify the RSPs have synchronized the SNMP engine ID and SNMP ifindex-table

                #more disk0:snmp/ifindex-table loc 0/rsp0/cpu0

                #more disk0:snmp/ifindex-table loc 0/rsp1/cpu0

                #more disk0:snmp/snmp_persist loc 0/rsp0/cpu0

                #more disk0:snmp/snmp_persist loc 0/rsp1/cpu0

               manually copy any eem scripts from RSP440 to RSP880 disks if applicable

 

 

10)      Failover from active RSP440 (slot 1) to standby RSP880 in (Slot0)

                “redundancy switchover”

 

 NOTE: A CLI switchover is needed, a physical OIR of the RP will not do the trick.

 

11)       Verify the RSP880 becomes active RSP and configuration is present

                At this point the RSP440 must be removed/eject from slot 1.

 

12)      Insert RSP880 (B) and allow it to boot to standby.  Do *not* set the ROMMON variable

 

13)      Issue show redundancy to confirm the RSPs have sync’d up and in correct state

 'show platform’

 'show redundancy'

  manually copy any eem scripts from Active RSP to Standby disks if applicable

 

  

              14)      On Active RSP880 Clear ROMMON variable from XR

run nvram_rommonvar RSP_LINK_1G 0

               more nvram:/classic-rommon-var

 

In case newly inserted second RSP880 fails to boot up, park standby RSP880 in Rommon.

Type “set” to list Rommon variables. Check for the presence of Rommon variable “RSP_LINK_1G=1”.

If set, unset RSP_LINK_1G and reset card by following below steps.

 

rommon B1> priv

rommon B1> unset RSP_LINK_1G

rommon B1> sync

rommon B1> sync

rommon B1> reset -h

 

               15)    Perform FPD upgrade on the RSP880 (optional)

admin upgrade hw-module fpd all location 0/rsp0/cpu0

               admin upgrade hw-module fpd all location 0/rsp1/cpu0

 

               16)    Deactivate and remove the SMU.

 

               17)    Check the EOBC ports are running at 10G with the following four commands, the output should be “10G/KR/1-lane”

show controllers epm-switch port-status 54 loc 0/rsp0/cpu0 | in Mode

show controllers epm-switch port-status 55 loc 0/rsp0/cpu0 | in Mode

show controllers epm-switch port-status 54 loc 0/rsp1/cpu0 | in Mode

show controllers epm-switch port-status 55 loc 0/rsp1/cpu0 | in Mode

 

              18)    Final checks to ensure system is optional and functional

 

          Note; this operation has been tested to be graceful with NSR configured

 

 

Comments

Hi Eddie..

Im running a new test. I´m running 5.3.3. Want to make easy transit from RSP440 to 880. 

The new test failed the same way. When i issue the switchover to RSP880 i loose about 10-12 pings. Everything looks good for about 1-2min. Then the router stops forwarding any packets.

If i reload the router it works. Below a snip of the "term mon" view.. 

RP/0/RSP0/CPU0:Jan  9 09:26:15.695 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : pim(1) (jid 1177) did not signal availability   

RP/0/RSP0/CPU0:Jan  9 09:26:15.696 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : pim6(1) (jid 1178) did not signal availability   

%SMART_LIC-3-COMM_FAILED:Communications failure with Cisco licensing cloud: Communications init failure

RP/0/RSP0/CPU0:Jan  9 09:26:30.385 CET: tacacsd[1150]: %SECURITY-AAA-4-WHITESPACE_TRUNCATED_IN_SERVER_KEY : WARNING: The server key contained trailing whitespace and was truncated

RP/0/RSP0/CPU0:Jan  9 09:27:12.017 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : lamptest-rp(1) (jid 308) exited, will be respawned with a delay (slow-restart)   

RP/0/RSP0/CPU0:Jan  9 09:27:12.018 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : lamptest-rp(308) (fail count 1) will be respawned in 10 seconds 

RP/0/RSP0/CPU0:Jan  9 09:27:12.031 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : ipv6_acl_daemon(1) (jid 294) exited, will be respawned with a delay (slow-restart)   

RP/0/RSP0/CPU0:Jan  9 09:27:12.031 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : ipv6_acl_daemon(294) (fail count 1) will be respawned in 10 seconds 

RP/0/RSP0/CPU0:Jan  9 09:27:12.989 CET: pim[1177]: %ROUTING-IPV4_PIM-3-BGP_ENS_ERROR : PIM BGP communication, PIM-ENS-P: Producer failed to create NRS class for general data, 'NRS' detected the 'try again' condition 'Problem connection to the server'

RP/0/RSP0/CPU0:Jan  9 09:27:12.990 CET: pim[1177]: %ROUTING-IPV4_PIM-4-PROC_EXIT : [1582] process exiting ...

RP/0/RSP0/CPU0:Jan  9 09:27:13.022 CET: pim6[1178]: %ROUTING-IPV4_PIM-3-BGP_ENS_ERROR : PIM BGP communication, PIM-ENS-P: Producer failed to create NRS class for general data, 'NRS' detected the 'try again' condition 'Problem connection to the server'

RP/0/RSP0/CPU0:Jan  9 09:27:13.024 CET: pim6[1178]: %ROUTING-IPV4_PIM-4-PROC_EXIT : [1582] process exiting ...

RP/0/RSP0/CPU0:Jan  9 09:27:14.141 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : pim(1) (jid 1177) exited, will be respawned with a delay (slow-restart)   

RP/0/RSP0/CPU0:Jan  9 09:27:14.141 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : pim(1177) (fail count 1) will be respawned in 10 seconds 

RP/0/RSP0/CPU0:Jan  9 09:27:14.192 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : pim6(1) (jid 1178) exited, will be respawned with a delay (slow-restart)   

RP/0/RSP0/CPU0:Jan  9 09:27:14.192 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : pim6(1178) (fail count 1) will be respawned in 10 seconds 

RP/0/RSP0/CPU0:Jan  9 09:27:15.095 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : envmon(1) (jid 209) exited, will be respawned with a delay (slow-restart)   

RP/0/RSP0/CPU0:Jan  9 09:27:15.095 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : envmon(209) (fail count 1) will be respawned in 10 seconds 

LC/0/0/CPU0:Jan  9 09:27:23.069 CET: pfm_node_lc[300]: %PLATFORM-DIAGS-3-LC_EOBC_FAILED_RSP1 : Set|online_diag_lc[172117]|LC EOBC Test(0x2000005)|Standby

LC/0/0/CPU0:Jan  9 09:27:30.000 CET: ipv4_io[245]: %IP-IP_EA-3-BLOCKAGE : FIB update is taking longer than 60 secs for 6 items, please investigate for blocked processes

RP/0/RSP0/CPU0:Jan  9 09:27:49.170 CET: bgp[1058]: %IP-LIBIPPIMDLL-3-ENS_NRS_FAIL : Failure (Producer failed to create NRS class for general data) reason - ('NRS' detected the 'try again' condition 'Problem connection to the server')

RP/0/RSP0/CPU0:Jan  9 09:27:49.170 CET: bgp[1058]: %IP-LIBIPPIMDLL-3-PIM_LIB_SYSDB : Failure (bgp-dll producer init failed) reason - ('NRS' detected the 'try again' condition 'Problem connection to the server')

RP/0/RSP0/CPU0:Jan  9 09:27:49.219 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : mpls_lsd(1) (jid 338) exited, will be respawned with a delay (slow-restart)   

RP/0/RSP0/CPU0:Jan  9 09:27:49.219 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : mpls_lsd(338) (fail count 1) will be respawned in 10 seconds 

RP/0/RSP0/CPU0:Jan  9 09:27:54.197 CET: dumper[58]: %OS-DUMPER-4-SIGNAL_NUMBER : Thread 1 received SIGNAL 6, si code 0, si errno 0

RP/0/RSP0/CPU0:Jan  9 09:27:54.197 CET: dumper[58]: %OS-DUMPER-4-SIGNALCORE_INFO : Core for pid = 204904 (pkg/bin/cluster_dlm_rp) as signal 6 sent by pkg/bin/cluster_dlm_rp@node0_RSP0_CPU0

RP/0/RSP0/CPU0:Jan  9 09:27:55.185 CET: rsi_master[402]: %OS-RSI_MASTER-3-GENERAL_FAILURE : 'rm_go_active' failed: 'NRS' detected the 'try again' condition 'Problem connection to the server'  : pkg/bin/rsi_master : (PID=536734) :  -Traceback= 42006b6 d4a7bae d53f445 d53d518 4200750 4200a5d 4200038

RP/0/RSP0/CPU0:Jan  9 09:27:55.221 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : rsi_master(1) (jid 402) exited, will be respawned with a delay (slow-restart)   

RP/0/RSP0/CPU0:Jan  9 09:27:55.222 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : rsi_master(402) (fail count 1) will be respawned in 10 seconds 

RP/0/RSP0/CPU0:Jan  9 09:27:55.229 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : nfmgr(1) (jid 1160) exited, will be respawned with a delay (slow-restart)   

RP/0/RSP0/CPU0:Jan  9 09:27:55.230 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : nfmgr(1160) (fail count 1) will be respawned in 10 seconds 

RP/0/RSP0/CPU0:Jan  9 09:27:55.553 CET: instdir[255]: %INSTALL-INSTMGR-3-FAILED_ENS_INIT : Install director was unable to initialise its event notification service. NRS class creation failed: 'NRS' detected the 'try again' condition 'Problem connection to the server'

RP/0/RSP0/CPU0:Jan  9 09:27:55.768 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : cluster_dlm_rp(1) (jid 171) exited, will be respawned with a delay (slow-restart)   

RP/0/RSP0/CPU0:Jan  9 09:27:55.768 CET: sysmgr[95]: %OS-SYSMGR-3-ERROR : cluster_dlm_rp(171) (fail count 1) will be respawned in 10 seconds 

RP/0/RSP0/CPU0:Jan  9 09:27:58.198 CET: dumper[58]: %OS-DUMPER-4-SIGSEGV : Thread 1 received SIGSEGV - Segmentation Fault

RP/0/RSP0/CPU0:Jan  9 09:27:58.198 CET: dumper[58]: %OS-DUMPER-4-SIGSEGV_INFO : Accessed BadAddr 0x0 at PC 0xa9582f7. Signal code 1 - SEGV_MAPPER. Address not mapped.

RP/0/RSP0/CPU0:Jan  9 09:27:58.199 CET: dumper[58]: %OS-DUMPER-4-CRASH_INFO : Crashed pid = 565567 (pkg/bin/icpe_satmgr)

Regards

Torben

Hi..

Sorry.. It works!! Had a TAC engineer helping me. 

The first thing he noticed was that there was a Tomahawk installed. Just removed that.

Assumption is the mother of all... Sorry..

Thx. to Vladimir from TAC..  Great !!

regards Torben 

Cisco Employee

:) glad ..

System must not have 800G (tomahawk) line cards, (see caveat (a) in list below)

 

Cisco Employee

Eddie,

I have two questions:

#1) Can we upgrade ASR9010 from 5.1.3 (RSP440) to 6.1.X(RSP880)?

#2) Will there be any downtime?

Cisco Employee

1) RSP880 doesn't support 5.1.3, and both RSPs must be on the same code during the upgrade. Therefor you need to upgrade the RSP440 to 6.1.x first before you migrate to RSP880.

2) This can be done in a hitless manner if you follow the steps in the document.

Regards

Eddie. 

Cisco Employee

Eddie,

Thanks for the information.

I will follow procedures:

#1) Upgrade from 5.1.3 to 6.1.x with RSP440

#2) After system is 6.1.x, Migrate RSP440 to RSP880.

Thanks,

Charlie

Cisco Employee

Hello Eddie

I was wondering about the recovery from 1G to 10G on RSP880

we have routers with RSP880 which have 1G configured because migration from RSP440 to RSP880 was not 100% followed.  Now we want to revisit these routers and change the RSP_LINK_1G to 0 which prompt a couple of questions

1- after reseting the variable via "run nvram_rommonvar RSP_LINK_1G 0" does the RSP880 need reboot for variable to take affect ?

2 - if reboot is needed, then a switch over will be done so a reboot occurrs, but would that create an out of sync situation and both RSP would not communicate, since one would have 10G configured and the other will have 1G ?

3- if above step 2 is accurrate how can we change the variable back to 10G without impact ?

thank you , Mustapha

Cisco Employee

Mustapha,

You won't have an out of sync issue. Reset the var, do a FO and do it again so its achieved on both RPs.

Regards

Eddie.

Cisco Employee

Hi Eddie

sorry I am trying to understand how it will not be impacting , so I have follow questions I hope will clear it for me

1- does the VAR change need a RSP reset to take effect or the change is done as soon as "run nvram_rommonvar RSP_LINK_1G 0" is entered ?

2 - if the change takes effect right away, would the standby RSP be affected to communicate with the active with one have COBC at 10G and the other at 1G ?

3- if the change takes affect upon reload when switching to stanbby RSP, now the new standby RSP will be set at 10G while the new active RSP  will be still at 1G till the reload is done on the new active RSP ?

do you mind to  clarify these points or state the steps upon the VAR change  till both RSP are back to 10G ?

thank you , mustapha

Cisco Employee

Hi Mustapha,

rommon variables are taking effect only during the boot of the card. Your understanding is correct on point #3.

regards,

/Aleksandar

Cisco Employee

Hi,Eddie

I have two questions:

Q.1:

10)      Failover from active RSP440 (slot 1) to standby RSP880 in (Slot0)

“redundancy switchover”

 

Is there any possibility that control/data plane traffic drop during this step with NSR?

Especially, when the following protocols have been deployed:OSPFv2/v3/BGP/LDP/RSVP

 

Q.2:

15)    Perform FPD upgrade on the RSP880 (optional)

 

What kind of situation do we need to perform this step?

In other words, can I think there's no need to do this step if the command “admin upgrade hw-module fpd all location 0/rsp0/cpu0” outputs all YES?

Thank you in advance.

King

Cisco Employee

Hi King:

1) For NSR packets, these are replicated in the LC by LPTS to both RSPs, so there is no chance that one RSP doesn't get them and the other does, of course your asking why do RSPs get out of NSR sync, well there could be a comms problem between them at any layer, so your good.

2) The command you have is how FPDs are upgraded, upgrading it is a good idea after you've done the migration.

Regards

Eddie.

Cisco Employee

Hi,Eddie

Thank you for your prompt reply.

For Q1,It's about step 10, if “redundancy switchover” command was issued to try to  Failover from active RSP440 (slot 1) to standby RSP880 in (Slot0), will it have any traffic impact if NSR are enabled and the following protocols have been deployed: OSPFv2/v3/BGP/LDP/RSVP?

Per my understanding, there should no any traffic impact,right?

Thank you in advance.

King

Cisco Employee

Correct - there should be no impact.

Beginner

Has this been tested with the A9K-VSM-500 line cards installed?

CreatePlease to create content
Content for Community-Ad
August's Community Spotlight Awards