08-26-2014 12:22 AM - edited 03-07-2019 08:31 PM
one of our WS-C6509 was crashed and auto reboot. Can help to find out the root cause? thanks!
Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9_WAN-M), Version 12.2(33)SXI8a, RELEASE SOFTWARE (fc1)
System image file is "disk0:s72033-adventerprisek9_wan-mz.122-33.SXI8a.bin"
Last reload reason: error - a Software forced crash, PC 0x42B037D8
Jul 27 01:08:50 BJ: %SYSTEM_CONTROLLER-3-ERROR: Error condition detected: TM_DATA_PARITY_ERROR
Jul 27 01:08:50 BJ: %SYSTEM_CONTROLLER-3-FATAL: An unrecoverable error has been detected. The system is being reset.
%Software-forced reload
Early Notification of crash condition..
01:08:50 BJ Sun Jul 27 2014: Breakpoint exception, CPU signal 23, PC = 0x42B037D8
--------------------------------------------------------------------
Possible software fault. Upon reccurence, please collect
crashinfo, "show tech" and contact Cisco Technical Support.
--------------------------------------------------------------------
-Traceback= 42B037D8 42B0132C 426CE1DC 42AF661C
$0 : 00000000, AT : 44EF0000, v0 : 46AD0000, v1 : 00000000
a0 : 47B05CE4, a1 : 0000FF00, a2 : 00000000, a3 : 00000000
t0 : 00000020, t1 : 3400FF01, t2 : 3400C100, t3 : FFFF00FF
t4 : 42AF6760, t5 : 50012358, t6 : 00000000, t7 : B4EB6DB5
s0 : 00000000, s1 : 44D40000, s2 : 44CB0000, s3 : 00000001
s4 : 44CB0000, s5 : 10020000, s6 : 00000068, s7 : 444A0000
t8 : 08028FEC, t9 : 00000000, k0 : 00000000, k1 : 00000000
gp : 44EED8E4, sp : 50012488, s8 : 00000001, ra : 42B0132C
EPC : 42B037D8, ErrorEPC : 94D877EE, SREG : 3400FF03
MDLO : 00000000, MDHI : 00000000, BadVaddr : 00000000
DATA_START : 0x448523D0
Cause 00000024 (Code 0x9): Breakpoint exception
========= Start of Crashinfo Collection (01:08:50 BJ Sun Jul 27 2014) ==========
For image:
Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9_WAN-M), Version 12.2(33)SXI8a, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2011 by Cisco Systems, Inc.
Compiled Sat 03-Dec-11 07:53 by prod_rel_team
Solved! Go to Solution.
08-26-2014 12:28 AM
Hi,
Please find explanation below:
Jul 27 01:08:50 BJ: %SYSTEM_CONTROLLER-3-ERROR: Error condition detected: TM_DATA_PARITY_ERROR
Jul 27 01:08:50 BJ: %SYSTEM_CONTROLLER-3-FATAL: An unrecoverable error has been detected. The system is being reset.
%Software-forced reload
Early Notification of crash condition..
01:08:50 BJ Sun Jul 27 2014: Breakpoint exception, CPU signal 23, PC = 0x42B037D8
Explanation
The most common errors from the Mistral ASIC on the Multilayer Switch Feature Card (MSFC) are TM_DATA_PARITY_ERROR, SYSDRAM_PARITY_ERROR,
SYSAD_PARITY_ERROR, and TM_NPP_PARITY_ERROR. The possible causes of these parity errors are random static discharge or other external factors.
Parity Errors are of two kinds:
. Soft parity errors - these occur when an energy level within the chip (for example, a one or a zero) changes - When referenced by the CPU, they cause the system to either crash or they recover. In case of a soft parity error, there is no need to swap the board or any of the components as they are generally Single Event Upsets (SEU).
. Hard parity errors - these occur when there is a chip or board failure that causes data to be corrupted (not bad all or most of the time). In this case, you need to re-seat or replace the affected component, usually a memory chip swap or a board swap. We say that there is a hard parity error when we see multiple parity errors at the same address. There are more complicated cases which are harder to identify but, in general, if we see more than one parity error in a particular memory region in a relatively short period of time, this may be considered as a hard parity error.
As this is the first occurrence this could be a transient issue. I suggest that we monitor for 48 hours to ensure it is stable and if there is no reoccurrence we can consider this a transient issue
Please let me know whether you have any questions or concerns with the analysis above.
Regards
Inayath
*Plz rate if this info is helpfull.
08-26-2014 12:28 AM
Hi,
Please find explanation below:
Jul 27 01:08:50 BJ: %SYSTEM_CONTROLLER-3-ERROR: Error condition detected: TM_DATA_PARITY_ERROR
Jul 27 01:08:50 BJ: %SYSTEM_CONTROLLER-3-FATAL: An unrecoverable error has been detected. The system is being reset.
%Software-forced reload
Early Notification of crash condition..
01:08:50 BJ Sun Jul 27 2014: Breakpoint exception, CPU signal 23, PC = 0x42B037D8
Explanation
The most common errors from the Mistral ASIC on the Multilayer Switch Feature Card (MSFC) are TM_DATA_PARITY_ERROR, SYSDRAM_PARITY_ERROR,
SYSAD_PARITY_ERROR, and TM_NPP_PARITY_ERROR. The possible causes of these parity errors are random static discharge or other external factors.
Parity Errors are of two kinds:
. Soft parity errors - these occur when an energy level within the chip (for example, a one or a zero) changes - When referenced by the CPU, they cause the system to either crash or they recover. In case of a soft parity error, there is no need to swap the board or any of the components as they are generally Single Event Upsets (SEU).
. Hard parity errors - these occur when there is a chip or board failure that causes data to be corrupted (not bad all or most of the time). In this case, you need to re-seat or replace the affected component, usually a memory chip swap or a board swap. We say that there is a hard parity error when we see multiple parity errors at the same address. There are more complicated cases which are harder to identify but, in general, if we see more than one parity error in a particular memory region in a relatively short period of time, this may be considered as a hard parity error.
As this is the first occurrence this could be a transient issue. I suggest that we monitor for 48 hours to ensure it is stable and if there is no reoccurrence we can consider this a transient issue
Please let me know whether you have any questions or concerns with the analysis above.
Regards
Inayath
*Plz rate if this info is helpfull.
08-28-2014 11:32 PM
Hi insharie,
The switch is working fine after reboot. Maybe it is soft parity error. Thank you for your reply.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide