03-19-2018 09:44 PM - edited 03-01-2019 03:22 PM
Hello All,
We have ASR 9006 and its ak9-rsp-4g is not stable in standby mode. It's rebooting again and again. If i remove my Active rsp then it is coming to active state and remains stable. Logs of both the RSP is given below.
Thanks in advance.
RP/0/RSP1/CPU0:ios(admin)#show RP/0/RSP1/CPU0:Jan 5 22:55:51.047 : licmgr[315]: %LICENSE-LICMGR-4-PACKAGE_LICENSE_INVALID : Package requesting A9K-LI-LIC license is activated on Rack0 and node node0_RSP1_CPU0 without a valid license/ valid configuration
RP/0/RSP1/CPU0:ios(admin)#show RP/0/RSP1/CPU0:Jan 5 22:54:23.107 : wdsysmon[469]: %HA-HA_WD-4-DISK_ALARM : A monitored device alarm set by /disk0:
RP/0/RSP0/CPU0:Jan 5 22:54:23.108 : wdsysmon[469]: %HA-HA_WD-4-DISK_WARN : A monitored device /disk0: is above 80% utilization. Current utilization = 94. Please remove unwanted user files and configuration rollback points.
RP/0/RSP0/CPU0:Jan 5 22:54:23.108 : wdsysmon[469]: %HA-HA_WD-4-DISK_WARN : A monitored device /disk1: is above 80% utilization. Current utilization = 94. Please remove unwanted user files and configuration rollback points.
:Jan 5 22:58:11.106 : licmgr[315]: %LICENSE-LICMGR-4-PACKAGE_LICENSE_INVALID : Package requesting A9K-LI-LIC license is activated on Rack0 and node node0_RSP1_CPU0 without a valid license/ valid configuration
pm
:Jan 5 22:58:18.290 : FABMGR[220]Writing crashinfo
Active processes:
pkg/bin/redfs_svr Thread ID 1 on cpu 0
Active processes:
pkg/bin/pfm_node_rp Thread ID 0 on cpu 1
[0xfc069a720] Record Reboot History, reboot cause = 0x2c00001b, descr = Cause: pfm_dev_sm_perform_recovery_action, Card reset requested by: Process ID: 331880 (fabmgr), Fault Sev: 0, Target node: 0/RSP1/CPU0, CompId: 0x15, Device Handle: 0x1033000, CondI[0xfc2b895c2] Record crashinfo
[0xfc306930a] Record Syslog
2000-01-05 22:58:18.336
NOTE: This is NOT a Kernel Crash. This crash was triggered
by the process 'pfm_node_rp', by calling reboot API.
Crash Reason: Cause: pfm_dev_sm_perform_recovery_action, Card reset requested by: Process ID: 331880 (fabmgr), Fault Sev: 0, Target node: 0/RSP1/CPU0, CompId: 0x15, Device Handle: 0x1033000, CondID: 8705, Fault Reason: Fabmgr encountered fatal fault. Switchover. Proces (Cause Code: 0x2c00001b)
Exception at 0x4a29b3b4 signal 5 c=1 f=3
Active process(s):
pkg/bin/redfs_svr Thread ID 1 on cpu 0
pkg/bin/pfm_node_rp Thread ID 0 on cpu 1
REGISTER INFO
r0 r1 r2 r3
R0 4a29b3b0 e7ffdb90 50013e30 00000003
r4 r5 r6 r7
R4 2800001b e7ffe205 e7ffdb68 00000000
r8 r9 r10 r11
R8 af9e7800 00000000 13e1c08b e7ffdb90
r12 r13 r14 r15
R12 4a2d6878 50013e30 e7fffb10 00000001
r16 r17 r18 r19
R16 e7fffb24 e7ffe6b0 00000000 00000000
r20 r21 r22 r23
R20 00000000 ec3cb164 e7ffe205 e7ffdb98
r24 r25 r26 r27
R24 e7ffdef3 e7ffe205 e7ffdef6 2800001b
r28 r29 r30 r31
R28 44002082 00000000 ec01f9e0 e7ffdb90
cnt lr msr pc
R32 4a203254 4a29b3b0 0002d932 4a29b3b4
cnd xer
R36 44002084 20000000
SUPERVISOR REGISTERS
Memory Management Registers
Instruction BAT Registers
Index # Value
IBAT0U # 0x1ffe
IBAT0L # 0x12
IBAT1U # 0
IBAT1L # 0
IBAT2U # 0
IBAT2L # 0
IBAT3U # 0xfffc0003
IBAT3L # 0x60011
IBAT4U # 0x4a0003ff
IBAT4L # 0x70000011
IBAT5U # 0x4c0003ff
IBAT5L # 0x72000011
IBAT6U # 0x4e0003ff
IBAT6L # 0x74000011
IBAT7U # 0
IBAT7L # 0
Data BAT Registers
Index # Value
DBAT0U # 0x1ffe
DBAT0L # 0x12
DBAT1U # 0x34000002
DBAT1L # 0xdc00002a
DBAT2U # 0x3000001e
DBAT2L # 0xc800002a
DBAT3U # 0xfffc0003
DBAT3L # 0x60011
DBAT4U # 0x4a0003ff
DBAT4L # 0x70000011
DBAT5U # 0x4c0003ff
DBAT5L # 0x72000011
DBAT6U # 0x4e0003ff
DBAT6L # 0x74000011
DBAT7U # 0
DBAT7L # 0
Exception Handling Registers
Data Addr Reg # DSISR
0 # 0
SPRG0 # SPRG1 # SPRG2 # SPRG3
0xe7ffdb90 # 0xec01f9e0 # 0 # 0x1
SaveNRestore SRR0 # SaveNRestore SRR1
0x4a29b3b0 # 0x2d932
Miscellaneous Registers
Processor Id Reg # 0
HID0 # 0x8493c1bc
HID1 # 0x2cc80
MSSCR0 # 0
MSSSR0 # 0
STACK TRACE
#0 0x4a29b3b0
[0xfe589af7a] Initializing harddisk file system
[0xfe9ede87d] Record TSEC information
!!!
Writing TSEC done
!!
Writing crashinfo done!
Examine crashinfo file for reboot reason
Writing ppc kernel core file
kernel core device: /kernel_core.by.pfm_node_rp.Z
[0xfec054d30] Kernel core dump start...
fill phdr vaddr=0x8f94000, offset=0x191a3d4, size=0x706c000
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Core dump success. Total_size 144204756
[0x10312ba137] Successfully dumped Kernel core
[0x103196e451] Record PCDS information
Writing PCDS done
Dump Directory
KD: RSP1.000105-225818.tsec, start = 1000, size = 7024, crc = 0
KD: RSP1.000105-225818.crashinfo.by.pfm_node_rp, start = 9000, size = ae7a, crc = 0
KD: RSP1.000105-225818.kernel_core.by.pfm_node_rp.Z, start = 14000, size = 2c21b08, crc = 57577412
KD: RSP1.000105-225818.pcds, start = 2c36000, size = ff000, crc = faca6283
Writing kernel core file done!
rebooting
Selecting ROMMON Image... B
DDR in Interleaved mode
POST 1 : PASSED : code 0 : DDR2 Memory Quick Test
CPU Reset Reason = 0x000d
POST 2 : PASSED : code 0 : FPGA Flash Image CRC Checks
Loading Field Programmable Devices:
FPGA 0-B PROGRAMMED : image: 0xff500028 - 0xff576cca, et: 117ms
FPGA 1-B PROGRAMMED : image: 0xff400028 - 0xff4d1034, et: 206ms
FPGA 2-B PROGRAMMED : image: 0xff100028 - 0xff276358, et: 369ms
FPGA 3-B PROGRAMMED : image: 0xff000028 - 0xff0454a8, et: 69ms
System Bootstrap, Version 1.06(20120210:003513) [ASR9K ROMMON],
Copyright (c) 1994-2012 by Cisco Systems, Inc.
Compiled Thu 09-Feb-12 16:35 by saurabja
CPUCtrl: 1.18 [00000001/00000012]
ClkCtrl: 1.23 [00000001/00000017]
IntCtrl: 1.15 [00000001/0000000f]
Punt: 1.5 [00000001/00000005]
CBC: 1.3
BID: 0x0006
PPC 8641D (partnum 0x8004), Revision 03.00, (Core Version 02.02)
M8641 CLKIN: 66 Mhz
Core Clock: 1333 Mhz
MPX Clock: 533 Mhz
LBC Clock: 33 Mhz
POST 3 : PASSED : code 0 : Slot ID/Board Type Validity
PCI-E1: Ready as Root Complex
PCI-E2: Ready as Root Complex
set_chassis_type: chassis_type=0xef02fb found=TRUE
ASR9K (8641D PPC) platform with 4096 Mb of main memory
program load complete, entry point: 0x100000, size: 0x2ac20
program load complete, entry point: 0x100000, size: 0x2ac20
MBI Candidate = disk0:asr9k-os-mbi-5.2.2/0x100000/mbiasr9k-rp.vm
CARD_SLOT_NUMBER: 1
CPU_INSTANCE: 1
MBI Validation starts ...
Mgt LAN 0 interface is selected
tsec_init_hw: configuring FE (port 2) for: Auto Speed, Auto Duplex
tsec_init_interface: hardware initialization completed
Interface link changed state to UP.
Interface link state up.
MBI validation sending request.
HIT CTRL-C to abort
..........
No MBI confirmation received from dSC
AUTOBOOT: Boot string = disk0:asr9k-os-mbi-5.2.2/0x100000/mbiasr9k-rp.vm,1;
AUTOBOOT: autobootstate=0, autobootcount=0, cmd=boot disk0:asr9k-os-mbi-5.2.2/0x100000/mbiasr9k-rp.vm
program load complete, entry point: 0x100000, size: 0x2ac20
MBI size from header = 20163132,Bootflash resident MBI filesize = 20163132
.............................................................................
program load complete, entry point: 0x203d78, size: 0x1339b3c
Attempting to start second CPU
Config = SMP, Running = SMP
Board type: 0x00100302
Card Capability = 0xffffffff
########################################################################################################
BSP: Board type : RO-RSP2
tracelogger: starting tracing in background ring mode
tracelogger running with args: -startring -F 1 -F 2
Restricted Rights Legend
Use, duplication, or disclosure by the Government is
subject to restrictions as set forth in subparagraph
(c) of the Commercial Computer Software - Restricted
Rights clause at FAR sec. 52.227-19 and subparagraph
(c) (1) (ii) of the Rights in Technical Data and Computer
Software clause at DFARS sec. 252.227-7013.
cisco Systems, Inc.
170 West Tasman Drive
San Jose, California 95134-1706
Cisco IOS XR Software for the Cisco XR ASR9K, Version 5.2.2
Copyright (c) 2014 by Cisco Systems, Inc.
File RSP1.000105-225818.pcds content has been changed during previous dump process. There could be some residual HW activities. It can be ignored for now.
Jan 05 23:00:06.266: Install Setup: Booting with committed software
Failed to rename debug file, 18, src: /nvram:/sysmgr.log.timeout.Z, target: /nvram:/prev.sysmgr.log.timeout.Z
Jan 05 23:04:25.498 : SYSMGR_LITE: Saving init logs in /nvram:/sysmgr.log.timeout.Z ...
FPD ltrace_file_name => fpd-agent/fiarsp
SAM detects CA certificate(Code Signing Server Certificate Authority,O=Cisco,C=US) has expired. The validity period is Oct 17, 2000 01:46:24 UTC - Oct 17, 2015 01:51:47 UTC. Continue at risk? (Y/N) [Default: N w/in 10]: Jan 05 23:11:29.618: Install Setup: Cleaning packages not in sync list
Jan 05 23:11:29.743: Install Setup: Complete
TAS1 con0/RSP1/CPU0 is in standby
FPD ltrace_file_name => fpd-agent/longbeach
FPD ltrace_file_name => fpd-agent/tempo
Jan 05 23:12:21.927: Install Setup: Syncing meta-data:
Jan 05 23:12:23.388: Install Setup: Complete
SRESET Exception on Core 1....
Writing crashinfo
Active processes:
proc/boot/procnto-booke-smp-instr Thread ID 0 on cpu 0
Active processes:
pkg/bin/redfs_svr Thread ID 1 on cpu 1
[0x1927967392] Record Reboot History, reboot cause = 0x28000125, descr = SRESET Exception
[0x19286bc0dc] Record crashinfo
[0x1928bbbd4d] Record Syslog
2000-01-05 23:12:40.862
Crash Reason: SRESET Exception (Cause Code: 0x28000125)
Exception at 0x4000043c signal 5 c=1 f=3
Active process(s):
proc/boot/procnto-booke-smp-instr Thread ID 0 on cpu 0
pkg/bin/redfs_svr Thread ID 1 on cpu 1
REGISTER INFO
r0 r1 r2 r3
R0 40000438 0ff8ef90 500083b0 5001a4d4
r4 r5 r6 r7
R4 e7fffc3b 00000014 30004500 00000005
r8 r9 r10 r11
R8 00009036 5001a000 00000000 0ff8ef90
r12 r13 r14 r15
R12 5000007c 500083b0 00000000 0fffe790
r16 r17 r18 r19
R16 901b285f 50000000 50016b94 00000000
r20 r21 r22 r23
R20 0bfa1554 00000000 00000001 00000000
r24 r25 r26 r27
R24 e7f1bfc0 0ffeb560 00060000 0fffe790
r28 r29 r30 r31
R28 00b4b214 50000824 0efdfa44 0ff8ef90
cnt lr msr pc
R32 00000000 40000438 00029036 4000043c
cnd xer
R36 44002082 20000000
SUPERVISOR REGISTERS
Memory Management Registers
Instruction BAT Registers
Index # Value
IBAT0U # 0x1ffe
IBAT0L # 0x12
IBAT1U # 0
IBAT1L # 0
IBAT2U # 0
IBAT2L # 0
IBAT3U # 0xfffc0003
IBAT3L # 0x60011
IBAT4U # 0x4a0003ff
IBAT4L # 0x70000011
IBAT5U # 0x4c0003ff
IBAT5L # 0x72000011
IBAT6U # 0x4e0003ff
IBAT6L # 0x74000011
IBAT7U # 0
IBAT7L # 0
Data BAT Registers
Index # Value
DBAT0U # 0x1ffe
DBAT0L # 0x12
DBAT1U # 0x34000002
DBAT1L # 0xdc00002a
DBAT2U # 0x3000001e
DBAT2L # 0xc800002a
DBAT3U # 0xfffc0003
DBAT3L # 0x60011
DBAT4U # 0x4a0003ff
DBAT4L # 0x70000011
DBAT5U # 0x4c0003ff
DBAT5L # 0x72000011
DBAT6U # 0x4e0003ff
DBAT6L # 0x74000011
DBAT7U # 0
DBAT7L # 0
Exception Handling Registers
Data Addr Reg # DSISR
0 # 0
SPRG0 # SPRG1 # SPRG2 # SPRG3
0xff8ef90 # 0xefdfa44 # 0x50000824 # 0x1
SaveNRestore SRR0 # SaveNRestore SRR1
0x40000438 # 0x29036
Miscellaneous Registers
Processor Id Reg # 0
HID0 # 0x8493c1bc
HID1 # 0x2cc80
MSSCR0 # 0
MSSSR0 # 0
STACK TRACE
#0 0x40000438
[0x19481af89f] Initializing harddisk file system
[0x194c814a0c] Record TSEC information
!!!
Writing TSEC done
!!
Writing crashinfo done!
Examine crashinfo file for reboot reason
Writing ppc kernel core file
kernel core device: /kernel_core.by.kernel.Z
[0x194e8d064a] Kernel core dump start...
fill phdr vaddr=0xfc7b000, offset=0x5ca4514, size=0x385000
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Core dump success. Total_size 100832532
[0x197704625c] Successfully dumped Kernel core
[0x19776fa5b9] Record PCDS information
!
Writing PCDS done
Dump Directory
KD: RSP1.000105-231240.tsec, start = 1000, size = 7024, crc = 0
KD: RSP1.000105-231240.crashinfo.by.kernel, start = 9000, size = ad23, crc = 0
KD: RSP1.000105-231240.kernel_core.by.kernel.Z, start = 14000, size = 162cf0d, crc = 2cca17c5
KD: RSP1.000105-231240.pcds, start = 1641000, size = ff000, crc = 18ff3d46
Writing kernel core file done!
rebooting
Selecting ROMMON Image... B
DDR in Interleaved mode
POST 1 : PASSED : code 0 : DDR2 Memory Quick Test
CPU Reset Reason = 0x000b
POST 2 : PASSED : code 0 : FPGA Flash Image CRC Checks
Loading Field Programmable Devices:
FPGA 0-B PROGRAMMED : image: 0xff500028 - 0xff576cca, et: 117ms
FPGA 1-B PROGRAMMED : image: 0xff400028 - 0xff4d1034, et: 206ms
FPGA 2-B PROGRAMMED : image: 0xff100028 - 0xff276358, et: 369ms
FPGA 3-B PROGRAMMED : image: 0xff000028 - 0xff0454a8, et: 69ms
System Bootstrap, Version 1.06(20120210:003513) [ASR9K ROMMON],
Copyright (c) 1994-2012 by Cisco Systems, Inc.
Compiled Thu 09-Feb-12 16:35 by saurabja
CPUCtrl: 1.18 [00000001/00000012]
ClkCtrl: 1.23 [00000001/00000017]
IntCtrl: 1.15 [00000001/0000000f]
Punt: 1.5 [00000001/00000005]
CBC: 1.3
BID: 0x0006
PPC 8641D (partnum 0x8004), Revision 03.00, (Core Version 02.02)
M8641 CLKIN: 66 Mhz
Core Clock: 1333 Mhz
MPX Clock: 533 Mhz
LBC Clock: 33 Mhz
POST 3 : PASSED : code 0 : Slot ID/Board Type Validity
PCI-E1: Ready as Root Complex
PCI-E2: Ready as Root Complex
set_chassis_type: chassis_type=0xef02fb found=TRUE
ASR9K (8641D PPC) platform with 4096 Mb of main memory
program load complete, entry point: 0x100000, size: 0x2ac20
program load complete, entry point: 0x100000, size: 0x2ac20
MBI Candidate = disk0:asr9k-os-mbi-5.2.2/0x100000/mbiasr9k-rp.vm
CARD_SLOT_NUMBER: 1
CPU_INSTANCE: 1
MBI Validation starts ...
Mgt LAN 0 interface is selected
tsec_init_hw: configuring FE (port 2) for: Auto Speed, Auto Duplex
tsec_init_interface: hardware initialization completed
Interface link changed state to UP.
Interface link state up.
MBI validation sending request.
HIT CTRL-C to abort
mbi_val_process_packet: received repsonse (rack 0)
Local image to boot : disk0:asr9k-os-mbi-5.2.2/0x100000/mbiasr9k-rp.vm
program load complete, entry point: 0x100000, size: 0x2ac20
MBI size from header = 20163132,Bootflash resident MBI filesize = 20163132
.............................................................................
program load complete, entry point: 0x203d78, size: 0x1339b3c
Attempting to start second CPU
Config = SMP, Running = SMP
Board type: 0x00100302
Card Capability = 0xffffffff
########################################################################################################
BSP: Board type : RO-RSP2
tracelogger: starting tracing in background ring mode
tracelogger running with args: -startring -F 1 -F 2
Restricted Rights Legend
Use, duplication, or disclosure by the Government is
subject to restrictions as set forth in subparagraph
(c) of the Commercial Computer Software - Restricted
Rights clause at FAR sec. 52.227-19 and subparagraph
(c) (1) (ii) of the Rights in Technical Data and Computer
Software clause at DFARS sec. 252.227-7013.
cisco Systems, Inc.
170 West Tasman Drive
San Jose, California 95134-1706
Cisco IOS XR Software for the Cisco XR ASR9K, Version 5.2.2
Copyright (c) 2014 by Cisco Systems, Inc.
Jan 05 23:13:51.539: Install Setup: Using install device 'disk0:'
File RSP1.000105-231240.pcds content has been changed during previous dump process. There could be some residual HW activities. It can be ignored for now.
Jan 05 23:14:01.580: Install Setup: Using MBI device 'bootflash:'
Jan 05 23:14:01.644: Install Setup: Preparing devices:
Jan 05 23:14:01.657: Install Setup: Complete
Jan 05 23:14:13.067: Install Setup: Starting package and meta-data sync
Jan 05 23:14:13.086: Install Setup: Cleaning packages not in sync list
Jan 05 23:14:13.093: Install Setup: Complete
Jan 05 23:14:20.325: Install Setup: Syncing meta-data:
Jan 05 23:14:21.802: Install Setup: Complete
Jan 05 23:14:21.802: Install Setup: Completed sync of all packages and meta-data
Jan 05 23:14:21.802: Install Setup: Starting MBI sync
Jan 05 23:14:37.771: Install Setup: Completed sync of MBIs
Thanks & Regards,
Abinash Kumar
Solved! Go to Solution.
04-09-2018 11:27 PM
Hi Aleksandar Vidakovic,
I have updated the rsp as you instructed but its behaving same. might be there is some hardware issue.
thanks for your support.
Best regards,
Abinash kumar.
03-20-2018 06:34 AM
Lack of a SW license and/or lack of space on the disk are not causing the standby reload. Nevertheless, please take care of those two items.
Also see https://www.cisco.com/c/en/us/support/docs/field-notices/639/fn63979.html because the router needs the new SW signing method.
The reason for the standby RSP reset has to do with the fabric connection:
Crash Reason: Cause: pfm_dev_sm_perform_recovery_action, Card reset requested by: Process ID: 331880 (fabmgr),...
It would be good to upgrade the router to 5.3.4 plus latest Service Pack to eliminate any SW issue.
03-21-2018 04:45 AM
Thanks a lot Aleksandar Vidakovic.
As per your suggestion, I will try to upgrade to 5.3.4 to eliminate any sw issues.
Thanks & Regards,
Abinash Kumar
03-21-2018 04:47 AM
hi Abinash,
that will be very good indeed. Please don't take just the base 5.3.4, install the latest Service Pack. If the router is already running 5.1.3 or later, you can activate 5.3.4+SP in one go.
/Aleksandar
03-22-2018 10:23 AM
03-22-2018 10:25 AM
03-22-2018 10:37 AM
Sounds like it might be CSCus81167
You could try removing the inactive packages ("admin install remove inactive").
03-22-2018 11:45 AM
It's possible that the installation on active got somehow corrupted, because it complains about one specific package that couldn't be synced (iosxr-fwding). Try running "admin install verify packages" and "admin install verify packages repair" on the active.
In any case, it would be good to upgrade to 5.3.4 plus latest Service Pack. That is in case this router has any Trident line card. If all cards are Typhoon, our recommendation is 6.2.3.
/Aleksandar
03-22-2018 12:05 PM
04-09-2018 11:27 PM
Hi Aleksandar Vidakovic,
I have updated the rsp as you instructed but its behaving same. might be there is some hardware issue.
thanks for your support.
Best regards,
Abinash kumar.
04-10-2018 01:35 AM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide