cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
798
Views
10
Helpful
4
Replies
Mark Pace Balzan
Beginner

ASR9912 RP2 - 5.3.3 -> 6.5.3 tragic crash and reboot loop. need help !

Hi,


I need some help from all the experts in this forum on a problem im facing

 

Im upgrading ASR9912 with RP2, SFC2, Typhoon and Tomahawk cards (no Trident). box is currently running 5.3.3, and the target release is 6.7.3 (32bit) with 6.5.3 as intermediate release as 5.3.3 -> 6.7.3 directly is not supported

 

To do this I have performed 'install add tar usb:/6.5.3.tar activate' to install 6.5.3

During the upgrade, notification of some FPD failures occurred while some were successful and then the box later rebooted as part of the process on completion of activation.

 

After reloading, both RP2 crashed, rebooted, and kept on crashing and rebooting in a loop with the following messages in the text box further down.

 

Eventually after about 5 reboots, one of the RP2 recovered but the other one did not recover at all.

 

I could see that the recovered RP was indeed running 6.5.3 but many FPD were not done and attempting a manual upgrade did not work. So since 6.5.3 was not committed I reloaded the 9912 to boot up again the committed 5.3.3 software, since this was the only option to bring back the box.

 

Clearly something went tragically wrong


So the questions I have are:

 

  • Is 5.3.3 -> 6.5.3 a correct upgrade path for 9912 RP2 or some other release is better as interim, or a different process must be followed for this 9912. I have actually performed 5.3.3 to different 6.x releases on multiple other ASR9k platforms successfully, but not on 9912 with RP2 until now which clearly failed ?
  • Is it considered good and valid practise to actually remove one of the RPs during the upgrade to keep it 'isolated and safe/known working' so in case of major issues such as what i saw which made the RP useless, we can rollback more easily by simply popping it in the slot and booting of this 'safe/old' RP ?
  • I know that turboboot is NOT recommended as an upgrade path, but not sure why this is not recommended and when it should actually be considered ?

thanks

 

Mark

 

!!! WARNING !! - Rommon booted from backup flash !!!

pcie_device_get_cnfg: Failed because of 'Subsystem(3290)' detected the 'warning' condition 'Code(4)' venid 0x8086 devid 0x10fc idx 0

pcie_device_get_cnfg: Failed because of 'Subsystem(3290)Failed to rename debug file, 18, src: /nvram:/sysmgr.log.timeout.Z, target: /nvram:/prev.sysmgr.log.timeout.Z

Nov 17 00:36:12.871 : SYSMGR_LITE: Saving init logs in /nvram:/sysmgr.log.timeout.Z ...
' detected the 'Nov 17 00:36:12.988 : SYSMGR_LITE: INIT: respawn 'vkg_dmac_svr' disabled, exit_code 40704, INIT_MAX_SPAWN reached
warning' conditireboot internal : cause code 671088647 cause INIT: respawn 'vkg_dmac_svr' disabled, exit_code 40704, INIT_MAX_SPAWN reached
Failed to rename debug file, 18, src: /nvram:/sysmgr.log.timeout.Z, target: /nvram:/prev.sysmgr.log.timeout.Z
reboot_internal timeout 30 is graceful no
No
vR e1b7o 0o0t: o3n6 :1A3SR9912 RP2 (0x100326) in slot 0
By init via REBOOT_CAUSE_SYSMGR (2c000007)
Current time: 2021-11-17 00:36:13.463, Up time: 10s
A kernel core file was explicitly requested by process init
Reboot Reason: Cause code 0x2c000007 Cause: INIT: respawn 'vkg_dmac_svr' disabled, exit_code 40704, INIT_MAX_SPAWN reached Process: init                                                                   Traceback: a892949 a892e62 a892c95 42073ce a7e0070 0

Active process(s):
    proc/boot/procnto-smp-instr pid 1 tid 1 on cpu 0, pri 0
    proc: fdfe4010, utime = 84838 ms, stime = 2015 ms
    thread: fdfc4010, thread sutime = 10603 ms, pc = fe6ee84a

    proc/boot/procnto-smp-instr pid 1 tid 2 on cpu 1, pri 0
    proc: fdfe4010, utime = 84838 ms, stime = 2015 ms
    thread: fdfc4348, thread sutime = 10942 ms, pc = fe6ee84a

    proc/boot/procnto-smp-instr pid 1 tid 3 on cpu 2, pri 0
    proc: fdfe4010, utime = 84838 ms, stime = 2015 ms
    thread: fdfc4680, thread sutime = 10818 ms, pc = fe6ee84a

    proc/boot/procnto-smp-instr pid 1 tid 4 on cpu 3, pri 0
    proc: fdfe4010, utime = 84838 ms, stime = 2015 ms
    thread: fdfc49b8, thread sutime = 10628 ms, pc = fe6ee84a

    x86/bin/init pid 8196 tid 6 on cpu 4, pri 10
    proc: fdfe4760, utime = 59 ms, stime = 9 ms
    thread: fdfcb9b8, thread sutime = 1 ms, pc = a8930d9
    eax = aaab7c8, ebx = 28000007, ecx = a8ddf64, edx = 28000007
    edi = 1500ba1c, esi = 0, ebp = 40faf6c, exx = fdfcbcd0
    cs  = f3, efl = 1287, esp = 40faad4, ss  = fb

    pkg/bin/pciesvr pid 28687 tid 1 on cpu 5, pri 10
    proc: fdfe64f0, utime = 361 ms, stime = 101 ms
    thread: fdfd5680, thread sutime = 361 ms, pc = aae77b2
    eax = e, ebx = 1500bc2c, ecx = 41ff63c, edx = aae77b2
    edi = 15007eb8, esi = 423fbb0, ebp = 41ff6c8, exx = fdfd5998
    cs  = f3, efl = 80003206, esp = 41ff63c, ss  = fb

    pkg/bin/devc-conaux pid 12299 tid 5 on cpu 6, pri 21
    proc: fdfe59d0, utime = 40 ms, stime = 5 ms
    thread: fdfd3010, thread sutime = 16 ms, pc = ad1d354
    eax = d, ebx = b4, ecx = 1, edx = 2f9
    edi = 1, esi = 30, ebp = 411bf9c, exx = fdfd3328
    cs  = f3, efl = 3202, esp = 411bf64, ss  = fb

    proc/boot/procnto-smp-instr pid 1 tid 11 on cpu 7, pri 10
    proc: fdfe4010, utime = 84838 ms, stime = 2015 ms
    thread: fdfde680, thread sutime = 243 ms, pc = fe6bebfd


Release mastership on RP2
Normal reboot
Writing crashinfo 
Crash Reason: Cause code 0x2c000007 Cause: INIT: respawn 'vkg_dmac_svr' disabled, exit_code 40704, INIT_MAX_SPAWN reached Process: init                                                                   Traceback: a892949 a892e62 a892c95 42073ce a7e0070 0

Exception at 0xa8930d9 signal 5 c=2 f=0

Active process(s):
        proc/boot/procnto-smp-instr Thread ID 0 on cpu 0
        proc/boot/procnto-smp-instr Thread ID 1 on cpu 1
        proc/boot/procnto-smp-instr Thread ID 2 on cpu 2
        proc/boot/procnto-smp-instr Thread ID 3 on cpu 3
        x86/bin/init Thread ID 5 on cpu 4
        pkg/bin/pciesvr Thread ID 0 on cpu 5
        pkg/bin/devc-conaux Thread ID 4 on cpu 6
        proc/boot/procnto-smp-instr Thread ID 10 on cpu 7

Reboot reason: Cause: INIT: respawn 'vkg_dmac_svr' disabled, exit_code 40704, INIT_MAX_SPAWN reached Process: init                                                                   Traceback: a892949 a892e62 a892c95 42073ce a7e0070 0



Dumping local syslog messages
RP/0/RP0/CPU0:Nov 17 00:36:12.516 : pciesvr[77]: %PLATFORM-PCIE-7-GEN_DEBUG : PCI_IoMsg - IOM_PCIE_GET_DEVICE_CONFIG not found venid 0x8086 devid 0x10fc inx 0  
RP/0/RP0/CPU0:Nov 17 00:36:12.524 : vkg_dmac_svr[78]: pcie_device_get_cnfg: reply_status fail! venid 0x8086 devid 0x10fc idx 0 
RP/0/RP0/CPU0:Nov 17 00:36:12.527 : vkg_dmac_svr[78]: %PLATFORM-DMAC-3-OPERATION_FAIL : dmac_hw_ini failure, error code Invalid argument  
RP/0/RP0/CPU0:Nov 17 00:36:12.536 : pciesvr[77]: %PLATFORM-PCIE-7-GEN_DEBUG : PCI_Io

 

4 REPLIES 4
tkarnani
Cisco Employee

Hi Mark,

 

i have attached the upgrade mop.

  • it looks like the path recommended in the doc is 5.3.4 to 6.5.3
  • removing the standby RP/RSP is used to recover in case the upgrade fails. if it fails or something happens to the active RP, you can remove it plug in the backup and you will be back online
  • Turboboot can be used, it is used to wipe the disk and start from scratch with whatever version you like. it does not preserve the existing configuration/packages etc so it requires a bit more work than an upgrade path with will preserve most of the configuration

in your case with the RP failing, i would recommend to boot in rommon and boot via usb using Turbo boot to see if it recovers

 

thanks

 

Hi

 

Thanks for the feedback.

Since my starting point is 5.3.3 not 5.3.4 what is the best approach?   

Should I upgrade first to 5.3.4  then 6.5.3 then 6.7.3 which seems like a long stretch.

Or should I consider an earlier 6.x release which is supported directly from 5.3.3 ?

 

One other question is should I keep fpd auto upgrade enabled or not as perhaps that also caused some issues?   I recall needing to do this in an earlier release but not sure if it should be on or off for upgrade from 5.3.3

 

Thanks again for your help

 

Mark

Hi Mark,

 

on the downloads page there is a docs.tar file that contains the upgrade mop.

unfortunately most of them will have the upgrade path from extended maintenance release to another release

in your case 5.3.4 is the extended release for 5.3.x code, if you were to upgrade to 6.2.x for example it will require another upgrade

 

so it seems that 2 upgrades are unavoidable, would you be able to turbo boot an RSP in a spare router to get the base code + configuration on? it might be faster as its 1 step

 

Thanks

 

Hi,

 

Unfortunately I dont have a spare 9912 but i can try turboboot to recover the failed RP2, or else RMA it

 

what is still not clear to me is whether it is mandatory to go from 5.3.3 to 5.3.4 before going to 6.x ?

 

I see that the MOP for 6.5.3 clearly states that upgrade from 5.3.4 is supported, but I believe 5.3.3 is also EMR and ive seen posts where it was confirmed that 5.3.3->6.x is supported. And indeed it worked fine in my lab on 9904 with RSP880 

 

So perhaps 5.3.3 -> 6.x is only ok to do in some cases and not in others ?

 

I guess if its 5.3.3->5.3.4->6.5.3->6.7.3  the indeed turboboot may make sense.

 

So is turboboot usually considered last resort as the standard process allows to install packages and smus in one command and preserves the config. while turboboot means starting from scratch (boot min.vm, add packages, add smus, add config), or are there other reasons or risks  for not doing turboboot, as we are essentially jumping direct from 5.3.3 to 6.7.3 anyway 

 

thanks

 

Mark