Introduction
The intent of this document is to explain in a simple manner the cause of the most common types of crashes and typical actions to take for resolving them.
False crashes
System returned to ROM by reload
Cause: the router was reloaded using the ‘reload’ command.
Solution: ensure the right persons can execute such command.
System returned to ROM by power-on
Cause: the router encountered a power outage, was manually power cycled or its power supply became deficient.
Solution: verify the status of the power supply with the 'show environment' and the syslogs (a failing power supply will typically be reported) and if failing, replace it. If the power supply works properly, then the cause is one of the two first possibilities.
System returned to ROM by abort
Cause: the router received a BREAK signal through the console.
Solution: verify the Configuration register from the 'show version'. If it is set to a value of 0xA0BC (A,B,C being any value - example: 0x2002), the configuration register needs to be changed to prevent accepting the BREAK signal except at bootup time. To do so, go to configuration mode and change its value to 0xA1BC (typical value being 0x2102) then reload the router to apply this value
Router#conf t
Router(config)#config-register ?
<0x0-0xFFFF> Config register number
Parity crashes : System returned to ROM by processor memory parity error or Crash due to "Cause = 0x20"
Cause:
1) If no parity error was seen within the previous month, the crash is due to a transient failure due to cosmic radiation, involving a bit swap in router memory.
2) If there was more than one parity error within the last month, the crash is due to a hardware failure from the route processor.
Solution:
1) Monitoring the router for the upcoming month. If no further parity error is seen, no further action is to be taken. If another parity error is seen, replace the route processor (don't forget to replace the possible memory upgrades as well).
2) Replace the route processor directly (don't forget to replace the possible memory upgrades as well).
Bus Error
Cause: the processor tries to access a memory location that either does not exist (a software error) or does not respond properly (a hardware problem).
You can find the memory location by looking at the address in the output of the show version command, as shown in this example:
Router#show version
Router uptime is 2 days, 21 hours, 30 minutes
System restarted by bus error at PC 0x30EE546, address 0xBB4C4
With the address accessed by the router when the bus error occurred, determine the memory location that the address corresponds to by issuing the show region command.
Solution:
1) If the address reported by the bus error does not fall within the ranges displayed in the output of the show region command, the router was trying to access an address that is not valid. This indicates the router faces a software problem. The first step towards the resolution is to upgrade the IOS software to the last version. If the crash is still seen afterwards, opening a TAC case would be the second step.
2) If the address falls within one of the ranges in the show region command output, it means that the router was accessing a valid memory address, but the hardware corresponding to that address is not responding properly. The next action is to replace the problematic hardware, being most of the time the route processor.</p>
Illegal Opcode & Sigtrap Exception
Cause: IOS software failure
Solution: 1st step is to upgrade the IOS software to the last version, verify if the crash still happens and if so, open a TAC case.
Watchdog timeout
1) When no crashinfo file is generated and the message *** Watch Dog Timeout *** is seen on the logs, the cause is most probably hardware. The solution is to replace the route processor.
2) When the message "Process aborted on watchdog timeout" is seen on the logs, and the crashinfo mentions a Software forced crash, the cause is an IOS software problem.
Software Forced Crash
Cause: IOS software failure, typically a corruption happening in the memory
Solution: 1st step is to upgrade the IOS software to the last version, verify if the crash still happens and if so, open a TAC case.