08-07-2013 03:25 AM - edited 03-04-2019 08:41 PM
Hello,
This router was running fine since a few years.
We had to do an electric maintenance yesterday morning and the router had to be electrically shutdown.
When rebooted it started crashing systematically after booting.
We've tried to remove linecards, just keep the RSP720 module.
Disconnect electrically everything for a while.
we've tried 2 different IOS versions.
But still the same it does a software crashes.
Looking into old syslogs I found the following error repeating:
SP: FAILED to write to DS1338 RTC device
Has anyone had a similar issue?
Thanks,
the only errors/odd messages I found during the boot: (otherwise check the attachement)
booting 15.1.3-S4
*Aug 7 09:48:19.103: %DIAG-3-CARD_ABSENT: è4xÁ¬è is not detected
*Aug 7 09:48:19.107: scp assert failure: queue != NULL: ../const/native/scp_const.c: 940
*Aug 7 09:48:19.107: -Traceback= 81BDC04z 86ABD28z 86AC3F0z 8C7BF6Cz 83A9A4Cz 83A9C3Cz 83850A0z 83977A0z 837D8B4z 8C7C000z 83A90A0z 83A32D4z
*Aug 7 09:48:19.107: %SCHED-7-WATCH: Attempt to monitor uninitialized watched queue (address 0). -Process= "slcp process", ipl= 0, pid= 138
-Traceback= 81BC060z 837F07Cz 86AC400z 8C7BF6Cz 83A9A4Cz 83A9C3Cz 83850A0z 83977A0z 837D8B4z 8C7C000z 83A90A0z 83A32D4z
*Aug 7 09:48:20.175: %OIR-6-CONSOLE: Changing console ownership to switch processor
*** System received a Software forced crash ***
signal= 0x17, code= 0x1500, context= 0xd77a148
PC = 0x83a74e0, Vector = 0x1500, SP = 0x146a6bc8
000028: *Aug 7 11:53:26.165: %DIAG-SP-6-RUN_MINIMUM: Module 1: Running Minimal Diagnostics...
%Software-forced reload
11:53:34 CEST Wed Aug 7 2013: Unexpected exception to CPU: vector 1500, PC = 0xB6A5E84 , LR = 0xB6A5E18
-Traceback= 0xB6A5E84z 0xB6A5E18z 0xB341684z 0xB190AFCz 0xB1DD090z 0xB1DD0F4z 0x8D600BCz 0x8D61450z 0x8D61FA4z 0x8D6255Cz 0xB06435Cz 0xB064AFCz 0xB71C0F0z 0xB71C430z 0xB71DE90z 0xB69692Cz
*** System received a Software forced crash ***
signal= 0x17, code= 0x1500, context= 0x12e4c850
PC = 0xb6a5e84, Vector = 0x1500, SP = 0x1ba51fa8
System Bootstrap, Version 12.2(33r)SRD5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 2009 by cisco Systems, Inc.
*Aug 7 11:53:36.013: %SYS-SP-3-LOGGER_FLUSHING: System pausing to ensure console debugging output.
*Aug 7 11:53:35.229: %C7600_ENV-SP-4-FANCOUNTFAILED: Required number of fan trays is not present
*Aug 7 11:53:36.013: %OIR-SP-6-CONSOLE: Changing console ownership to switch processor
*** System received a Software forced crash ***
signal= 0x17, code= 0x1500, context= 0xd77a148
PC = 0x83a74e0, Vector = 0x1500, SP = 0x17ecd450
booting 12.2.33-SRE1
000019: *Aug 7 12:06:36.975: %FABRIC-SP-5-FABRIC_MODULE_ACTIVE: The Switch Fabric Module in slot 1 became active.
000020: *Aug 7 12:06:38.987: %DIAG-SP-6-RUN_MINIMUM: Module 1: Running Minimal Diagnostics...
%Software-forced reload
12:06:46 CEST Wed Aug 7 2013: Unexpected exception to CPU: vector 1500, PC = 0xAE94130 , LR = 0xAE940F4
-Traceback= 0xAE94130 0xAE940F4 0xAF27184 0xAE85F2C 0xAF41170 0xAF41170 0xAE85FE8 0xAE90BAC 0xAE90C30 0xAE8D178
*** System received a Software forced crash ***
signal= 0x17, code= 0x1500, context= 0xe5b73c8
PC = 0xae94130, Vector = 0x1500, SP = 0x158a99b8
e
*Aug 7 12:06:48.059: %SYS-SP-3-LOGGER_FLUSHING: System pausing to ensure console debugging output.
*Aug 7 12:06:46.895: %EARL-SP-2-PATCH_INVOCATION_LIMIT: 10 Recovery patch invocations in the last 30 secs have been attempted. Max limit reached
*Aug 7 12:06:48.059: %OIR-SP-6-CONSOLE: Changing console ownership to switch processor
*** System received a Software forced crash ***
signal= 0x17, code= 0x1500, context= 0xd247660
PC = 0x835948c, Vector = 0x1500, SP = 0x16026890
08-07-2013 10:15 AM
Hello,
I have completed analyzing the outputs provided by you. Kindly find the Problem Analysis & Next Action Plan as below.
Problem Analysis:-
=================
The RSP720 comprises of the Switch Processor & the Route Processor. The switch processor I believe is set with a different configuration register value & the configuration register on the route processor is different. The 0x0 was set on the SP it means that "break" is enabled. So the switch must has been receiving a signal that it interpreted as "break" and it was causing the router to reload.
Next Action Plan:-
=================
Change the configuration-register on the Switch Processor & Route Processor to uniformly 0x2102.
Method to change it is as below.
Step 1
Remove the RSP720 out of the chassis. Ensure that there is enough power supplies & fan trays to support normal operation on the chassis. Now insert the RSP720 into the chassis. While it boots up, wait for the below log to be dispalyed.
%OIR-6-CONSOLE: Changing console ownership to switch processor
As soon as you see the above message, break into SP ROMMON mode.
Now set the config reg to 0x2102 -> then do a sync -> then reset
Step 2
After that kindly wait till the below message is seen.
%OIR-SP-6-CONSOLE: Changing console ownership to route processor
As soon as you see the above message, break into RP ROMMON mode.
Now set the config reg to 0x2102 -> then do a sync -> then reset
This should help in resolving the issue.
****Plz do rate this post without fail if you found it to be helpful*********
Thanks & Regards,
Vignesh R P
08-07-2013 10:22 AM
Hello,
With respect to "SP: FAILED to write to DS1338 RTC device" error log, kindly find the explanantion below.
DS1338 IC is the Real Time Clock (RTC) with NVRAM. The DS1338 serial real-time clock (RTC) is a low-power, full binary-coded decimal (BCD)clock/calendar plus 56 bytes of NV SRAM. It seems the router failed to write information to the chip, however, from 'show clock', seems the clock currently is not been affected.
The error is cosmetic and should not have any performance impact on the router.
Also let me know on which IOS version these error logs were seen.
****Plz do rate this post without fail if you found it to be helpful*********
Thanks & Regards,
Vignesh R P
08-08-2013 01:30 AM
Hello Vignesh,
Please note that a some point I configured the following: confreg 0x2142
in the hope that the router would manage to boot cleanly without loading the full configuration.
But that didn't help,
I will now try to change back the confreg on both SP and RP
I'll get back to you when done.
But the fact that we have software-forced reload linked to an unexpected exception to CPU, makes me think it is not confreg related.
Regarding the RTC failure message:
we were seeing that on 12.2(33)-SRE1
I couldn't find those logs anymore after we tried 15.1.3-S4
thanks
08-08-2013 02:43 AM
Hello,
I still believe that the issue faced by you is due to the config-reg values. Kindly follow the action plan provided by me and it would help you in resolving the problem.
Regarding the RTC failure message, it is due to Cisco IOS BUG CSCts20541. This BUG is hits 12.2(33)SRE1 but I guess should be fixed in the other code you are using.
****Plz do rate this post without fail if you found it to be helpful*********
Thanks & Regards,
Vignesh R P
08-08-2013 03:55 AM
Hi Vignesh,
I've done a fresh boot sesssion ( from electrial power-on )
short version: after configuring 0x2102, it boots and crashes 3 times and then gives up.
belew you will find to most interesting steps.
and attached the full log.
the first steps were before power-cycling:
rommon 4 > BOOT=bootdisk:c7600rsp72043-ipservicesk9-mz.122-33.SRE1.bin
rommon 5 > sync
rommon 6 >
disconnect power, remove rsp, insert rsp, power-on
System Bootstrap, Version 12.2(33r)SRD5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 2009 by cisco Systems, Inc.
C7600-RSP720/SP platform with 1048576 Kbytes of main memory
rommon 1 > set
PS1=rommon ! >
RELOAD_TYPE=1
NT_K=0:0:0:0
SLOTCACHE=cards;
LOG_PREFIX_VERSION=1
RET_2_RTS=06:29:05 CEST Tue Jul 23 2013
RET_2_RCALTS=
RANDOM_NUM=599429381
?=0
BSI=0
PF_REDUN_CRASH_COUNT=0
CRASHINFO=crashinfo_FAILED
BOOT=bootdisk:c7600rsp72043-ipservicesk9-mz.122-33.SRE1.bin
rommon 2 > confreg 0x2102
You must reset or power cycle for new config to take effect
rommon 3 > sync
rommon 4 > reset
Resetting .......
System Bootstrap, Version 12.2(33r)SRD5, RELEASE SOFTWARE (fc1)
<....>
Cisco IOS Software, c7600rsp72043_sp Software (c7600rsp72043_sp-IPSERVICESK9-M), Version 12.2(33)SRE1, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2010 by Cisco Systems, Inc.
Compiled Mon 29-Mar-10 22:56 by prod_rel_team
*Aug 8 10:22:06.327: %SYS-SP-3-LOGGER_FLUSHING: System pausing to ensure console debugging output.
*Aug 8 10:22:03.551: %PFREDUN-6-ACTIVE: Initializing as ACTIVE processor
*Aug 8 10:22:06.327: %OIR-SP-6-CONSOLE: Changing console ownership to route processor
I hit ctrl-break ( fn-break on my thinkpad )
System Bootstrap, Version 12.2(33r)SRD5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 2009 by cisco Systems, Inc.
C7600-RSP720/RP pltform with 2097152 Kbytes of main memory
rommon 1 >
so I guess I'm in RP rommon now.
I configure confreg 0x2012 and the reset
rommon 1 > confreg 0x2102
rommon 2 > sync
rommon 3 > set
PS1=rommon ! >
RELOAD_TYPE=1
LOG_PREFIX_VERSION=1
SLOTCACHE=cards;
BOOT=sup-bootdisk:c7600rsp72043-advipservicesk9-mz.151-3.S4.bin,1;sup-bootdisk:c7600rsp72043-ipservicesk9-mz.122-33.SRE1.bin,1;
?=0
RET_2_RTS=12:06:24 CEST Thu Aug 8 2013
RET_2_RCALTS=1375956389
CRASHINFO=bootdisk:crashinfo_20130808-120624-CEST
rommon 4 > confreg
Configuration Summary
(Virtual Configuration Register: 0x2102)
enabled are:
[ 0 ] load rom after netboot fails
[ 1 ] console baud: 9600
boot: ...... image specified by the boot system commands or default to: cisco2-C7600-RSP720/RP
do you wish to change the configuration? y/n [n]: n
rommon 5 > reset
Resetting .......
and from there on it will auto-reboot 3 times and crash each time (software-forced reload) with the following log:
the reload always happens right after Running Minimal Diagnostics
000013: *Aug 8 10:28:27.846: %SYS-SP-5-RESTART: System restarted --
Cisco IOS Software, c7600rsp72043_sp Software (c7600rsp72043_sp-IPSERVICESK9-M), Version 12.2(33)SRE1, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2010 by Cisco Systems, Inc.
Compiled Mon 29-Mar-10 22:56 by prod_rel_team
000014: *Aug 8 10:28:29.883: %OIR-SP-6-INSPS: Power supply inserted in slot 1
000015: *Aug 8 10:28:29.883: %C7600_PWR-SP-4-PSOK: power supply 1 turned on.
000016: *Aug 8 12:28:31.879: %SNMP-5-COLDSTART: SNMP agent on host br01.gva-cogent is undergoing a cold start
000017: *Aug 8 12:28:31.899: %FABRIC-SP-5-CLEAR_BLOCK: Clear block option is off for the fabric in slot 1.
000018: *Aug 8 12:28:31.979: %FABRIC-SP-5-FABRIC_MODULE_ACTIVE: The Switch Fabric Module in slot 1 became active.
000019: *Aug 8 12:28:33.991: %DIAG-SP-6-RUN_MINIMUM: Module 1: Running Minimal Diagnostics...
%Software-forced reload
12:28:41 CEST Thu Aug 8 2013: Unexpected exception to CPU: vector 1500, PC = 0xAE94130 , LR = 0xAE940F4
-Traceback= 0xAE94130 0xAE940F4 0xAF27184 0xAE85F2C 0xAF41170 0xAF41170 0xAE85FE8 0x98EA348 0xAE900F8 0xAE91874 0xAE8D178
CPU Register Context:
MSR = 0x00029200 CR = 0x20000022 CTR = 0x0B0A6B04 XER = 0x00000000
R0 = 0x0AE940F4 R1 = 0x158A99B8 R2 = 0xFFFCFFFC R3 = 0x156C92B0
R4 = 0x78070442 R5 = 0xDC00003C R6 = 0x78070442 R7 = 0x00000001
R8 = 0x00029200 R9 = 0x00000000 R10 = 0x14FA117C R11 = 0xFFB40000
R12 = 0x00000FF9 R13 = 0x04044000 R14 = 0x0EFF1F40 R15 = 0x0EFF202C
R16 = 0x00000001 R17 = 0x00000001 R18 = 0x00000000 R19 = 0x0D470000
R20 = 0x00000001 R21 = 0x0F015758 R22 = 0x0D470000 R23 = 0x0F0158B0
R24 = 0x0F015600 R25 = 0x0D470000 R26 = 0x0000FFFF R27 = 0xFFB40000
R28 = 0x00000005 R29 = 0x0C2E8340 R30 = 0x00021200 R31 = 0x00000000
Writing crashinfo to bootdisk:crashinfo_20130808-122841-CEST
1076 Unused bytes of context save space
*** System received a Software forced crash ***
signal= 0x17, code= 0x1500, context= 0xe5b73c8
PC = 0xae94130, Vector = 0x1500, SP = 0x158a99b8
e
*Aug 8 12:28:43.067: %SYS-SP-3-LOGGER_FLUSHING: System pausing to ensure console debugging output.
*Aug 8 12:28:41.899: %EARL-SP-2-PATCH_INVOCATION_LIMIT: 10 Recovery patch invocations in the last 30 secs have been attempted. Max limit reached
*Aug 8 12:28:43.067: %OIR-SP-6-CONSOLE: Changing console ownership to switch processor
*** System received a Software forced crash ***
signal= 0x17, code= 0x1500, context= 0xd247660
PC = 0x835948c, Vector = 0x1500, SP = 0x16026a18
System Bootstrap, Version 12.2(33r)SRD5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 2009 by cisco Systems, Inc.
C7600-RSP720/SP platform with 1048576 Kbytes of main memory
Autoboot executing command: "boot bootdisk:c7600rsp72043-ipservicesk9-mz.122-33.SRE1.bin"
Initializing ATA monitor library...
Self extracting the image... [OK]
Self decompressing the image : ########################################################################################################################################################################################################################################################################## [OK]
Restricted Rights Legend
Use, duplication, or disclosure by the Government is
subject to restrictions as set forth in subparagraph
(c) of the Commercial Computer Software - Restricted
Rights clause at FAR sec. 52.227-19 and subparagraph
(c) (1) (ii) of the Rights in Technical Data and Computer
Software clause at DFARS sec. 252.227-7013.
cisco Systems, Inc.
170 West Tasman Drive
San Jose, California 95134-1706
Cisco IOS Software, c7600rsp72043_sp Software (c7600rsp72043_sp-IPSERVICESK9-M), Version 12.2(33)SRE1, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2010 by Cisco Systems, Inc.
Compiled Mon 29-Mar-10 22:56 by prod_rel_team
Active crashed three times, disabling auto-boot and dropping to rommon
%Software-forced reload
10:30:05 UTC Thu Aug 8 2013: Unexpected exception to CPU: vector 1500, PC = 0x835948C , LR = 0x8359450
-Traceback= 0x835948C 0x8359450 0x8C28ABC 0x8BE6BB8 0x8BE5C08 0x835AFD0 0x83552E8
CPU Register Context:
MSR = 0x00029200 CR = 0x40000002 CTR = 0x08E24314 XER = 0x00000000
R0 = 0x08359450 R1 = 0x13010ED0 R2 = 0xFFFCFFFC R3 = 0x11453958
R4 = 0x00000008 R5 = 0x09EA4F30 R6 = 0x0E712700 R7 = 0x12FB7F20
R8 = 0x00029200 R9 = 0x00000000 R10 = 0x12FB7F20 R11 = 0x0000014A
R12 = 0x000013BC R13 = 0x04044000 R14 = 0x08BE5BD4 R15 = 0x00000000
R16 = 0x00000000 R17 = 0x00000000 R18 = 0x00000000 R19 = 0x00000000
R20 = 0x00000000 R21 = 0x00000000 R22 = 0x00000000 R23 = 0x00000000
R24 = 0x00000000 R25 = 0x00000000 R26 = 0x0D220000 R27 = 0x00000000
R28 = 0x00000000 R29 = 0x00000007 R30 = 0x00000000 R31 = 0x00000000
------------------ show chunk failures ------------------
------------------ show redundancy states ------------------
File _20130808-103005-UTC Device Error :No such device
1085 Unused bytes of context save space
*Aug 8 10:30:06.939: %SYS-3-LOGGER_FLUSHING: System pausing to ensure console debugging output.
*Aug 8 10:30:05.675: scp assert failure: queue != NULL: ../const/native/scp_const.c: 929
*Aug 8 10:30:05.675: -Traceback= 8176198 85A3A54 85A40F4 8AFA244 835B97C 835BB50 8338FF4 834AAA0 8331954 8AFA2D8 835AFD0 83552E8
*Aug 8 10:30:05.675: %SCHED-7-WATCH: Attempt to monitor uninitialized watched queue (address 0). -Process= "slcp process", ipl= 0, pid= 135
-Traceback= 81745F8 8333068 85A4104 8AFA244 835B97C 835BB50 8338FF4 834AAA0 8331954 8AFA2D8 835AFD0 83552E8
*Aug 8 10:30:06.939: %OIR-6-CONSOLE: Changing console ownership to switch processor
*** System received a Software forced crash ***
signal= 0x17, code= 0x1500, context= 0xd247660
PC = 0x835948c, Vector = 0x1500, SP = 0x13010ed0
System Bootstrap, Version 12.2(33r)SRD5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 2009 by cisco Systems, Inc.
C7600-RSP720/SP platform with 1048576 Kbytes of main memory
rommon 1 >
at the end it gives up and goes back to rommon
How do I know which rommon (SP or RP) I'm in?
How much time do I have to break?
I've tried the same with the 15.1.3-S4 release and the same happens.
Don't know what to do from here.
Should we look for a replacement hardware?
Thanks,
08-08-2013 04:08 AM
I've found another post from one month ago (Jul 3, 2013 8:55) in this forum about a similar crash:
https://supportforums.cisco.com/thread/2226462
%Software-forced reload
12:44:53 UTC Thu Jan 27 2000: Unexpected exception to CPU: vector 1500, PC = 0xAE94130 , LR = 0xAE940F4
-Traceback= 0xAE94130 0xAE940F4 0xAF27184 0xAE85F2C 0xAF41170 0xAF41170 0xAE85FE
8 0x98E6168
which is the exactly same exeption as in my case:
%Software-forced reload
12:23:51 CEST Thu Aug 8 2013: Unexpected exception to CPU: vector 1500, PC = 0xAE94130 , LR = 0xAE940F4
-Traceback= 0xAE94130 0xAE940F4 0xAF27184 0xAE85F2C 0xAF41170 0xAF41170 0xAE85FE8 0x9973158 0xAE90C30 0xAE8D178
the PC and LR are the same.
this guys was also runnning 12.2(33)SRE1
these values are of course different when booting in 15-1-3-s4
I don't know if he managed to solve his issue.
Thanks
08-08-2013 07:32 AM
Hello,
%ONLINE-SP-6-DNLDFAIL: Module 1, Proc. 1, Runtime image download failed because of scp send failure
This message indicates that the system was unable to download the runtime image to the RSP.
%EARL-SP-2-PATCH_INVOCATION_LIMIT: 10 Recovery patch invocations in the last 30 secs have been attempted. Max limit reached
The error message EARL-SP-2-PATCH_INVOCATION_LIMIT are related with the EARL ASIC. The EARL is a processing chip used for handling packets coming to-from the chassis bus or switch fabric. The "EARL PATCH LIMIT" feature is a recovery mechanism that reloads a card if it fails to take control of the bus over 10 times during a 30 seconds interval.
These above two messages repeatedly seen during every time the RSP crashes makes me conclude that the RSP on the device has gone faulty. There is no other choice left than replacing it. Kindly go ahead & replace the RSP.
****Plz do rate this post without fail if you found it to be helpful*********
Thanks & Regards,
Vignesh R P
08-08-2013 11:25 PM
ok
that's really bad news. considering the time this RSP has served us..
anyway we're going for replacement.
I'll update this post when the replacement card is runnning.
Thanks
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide