03-09-2013 04:57 AM - edited 03-07-2019 12:08 PM
Hello, we have a backup sup 720 which has a 2 gigabit ethernet though port channel, to another chassis. Suddenly UDLD detected an error and got into err disable, then this err disable didn't let the interface set to DOWN, and created a switch loop, then our Supervisor reloaded. I'd like to know what could have caused this reload, from any experience someone could have had the same issue. In my opinion could have a been the switch loop, but also I've been checking from the output interpreter the show tech and might have been a bug, the only one that could match in IOS version 12.2(33)SXH, is this one:
http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtj95352&from=summary
We're going to disable err-disable next time I guess and recover the link manually, apart from that if anyone ever had this issue, what could have made the sup for crash and reload?
Kind regards.
Solved! Go to Solution.
03-11-2013 09:22 AM
Hi Pastrana,
I just decoded the traceback generated on the crash info file and I see that the device has been crashed due to Parity error.
Cache error detected!
CPO_ECC (reg 26/0): 0x000000FC
CPO_CACHERI (reg 27/0): 0x20000000
CP0_CAUSE (reg 13/0): 0x00000800
Real cache error detected. System will be halted.
Error: Primary instr cache, fields: data,
Actual physical addr 0x00000000,
virtual address is imprecise.
Imprecise Data Parity Error
Imprecise Data Parity Error
21:50:17 GMT+1 Thu Mar 7 2013: Interrupt exception, CPU signal 20, PC = 0x420DA8A4
Explanation and Action plan:
====================
. In most cases, a parity error is caused due to transient software issue and would recover by itself after reset.
These are the two kinds of parity errors:
Soft parity errors
These errors occur when an energy level within the chip (for example, a one or a zero)
changes. When referenced by the CPU, such errors cause the system to either crash (if the
error is in an area that is not recoverable) or they recover other systems (for example, a
CyBus complex restarts if the error was in the packet memory (MEMD)). In case of a soft
parity error, there is no need to swap the board or any of the components.
Hard parity errors
These errors occur when there is a chip or board failure that corrupts data. In this case,
you need to re-seat or replace the affected component, which usually involves a memory
chip swap or a board swap. There is a hard parity error when multiple parity errors occur
at the same address. There are more complicated cases that are harder to identify. In
general, if you see more than one parity error in a particular memory region in a
relatively short period, you can consider it to be a hard parity error.
Suggestion:
Studies have shown that soft parity errors are 10 to 100 times more frequent than hard
parity errors. Therefore, Cisco highly recommends you to wait for a hard parity error
before you replace Supervisor. This greatly reduces the impact on your network
To learn more about Parity Errors please check the following CCO documentations:
https://www.cisco.com/en/US/products/hw/routers/ps341/products_tech_note09186a0080094793.
html
HTH
Regards
Inayath
*Plz rate the usefull posts.
03-10-2013 05:24 PM
HI,
Could you please provide the show tech and crash info file from the switch ?
Regards
Inayath.
03-11-2013 12:47 AM
03-11-2013 09:22 AM
Hi Pastrana,
I just decoded the traceback generated on the crash info file and I see that the device has been crashed due to Parity error.
Cache error detected!
CPO_ECC (reg 26/0): 0x000000FC
CPO_CACHERI (reg 27/0): 0x20000000
CP0_CAUSE (reg 13/0): 0x00000800
Real cache error detected. System will be halted.
Error: Primary instr cache, fields: data,
Actual physical addr 0x00000000,
virtual address is imprecise.
Imprecise Data Parity Error
Imprecise Data Parity Error
21:50:17 GMT+1 Thu Mar 7 2013: Interrupt exception, CPU signal 20, PC = 0x420DA8A4
Explanation and Action plan:
====================
. In most cases, a parity error is caused due to transient software issue and would recover by itself after reset.
These are the two kinds of parity errors:
Soft parity errors
These errors occur when an energy level within the chip (for example, a one or a zero)
changes. When referenced by the CPU, such errors cause the system to either crash (if the
error is in an area that is not recoverable) or they recover other systems (for example, a
CyBus complex restarts if the error was in the packet memory (MEMD)). In case of a soft
parity error, there is no need to swap the board or any of the components.
Hard parity errors
These errors occur when there is a chip or board failure that corrupts data. In this case,
you need to re-seat or replace the affected component, which usually involves a memory
chip swap or a board swap. There is a hard parity error when multiple parity errors occur
at the same address. There are more complicated cases that are harder to identify. In
general, if you see more than one parity error in a particular memory region in a
relatively short period, you can consider it to be a hard parity error.
Suggestion:
Studies have shown that soft parity errors are 10 to 100 times more frequent than hard
parity errors. Therefore, Cisco highly recommends you to wait for a hard parity error
before you replace Supervisor. This greatly reduces the impact on your network
To learn more about Parity Errors please check the following CCO documentations:
https://www.cisco.com/en/US/products/hw/routers/ps341/products_tech_note09186a0080094793.
html
HTH
Regards
Inayath
*Plz rate the usefull posts.
03-12-2013 02:42 AM
Thank you very much for your answer.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide