Problem
Double-bit ECC error causes LC reboot as soon as the error is seen, hence we do not have sufficient time to capture all the logs whenever Double-bit ECC error is seen. In eXR system the issue is more severe as the logs captured after Double-bit ECC error is stored in LC disk. The logs are lost incase the LC is replaced with a different LC.
Solution
The current solution is only for eXR based system. We can use an EEM script to make sure that all the logs are captured as soon as Double-bit ECC error is hit. Sample script is attached to this document(File Name: ecc_capture.tcp.zip).
The script does following:
1. Disable LC reload
2. Capture all the logs(NP datalog/show tech NP etc).
3. Reload the LC
Configs required in the Node
event manager environment _syslog_pattern .*(Double-bit ECC error detected).*
event manager directory user policy harddisk:
event manager policy ecc_capture.tcl username eem persist-time 3600 type user
aaa authorization eventmanager default local
username eem group root-lr
username eem group cisco-support
Steps to enable the script
1. Save script on harddisk of both RSP and name it ecc_capture.tcl
2. Add the configuration mentioned in "Configs required in the node" section