08-09-2022 02:38 PM
Hi everyone can someone help me to understanding with this syslog in NCS540 (N540-28Z4C-SYS-A)
the syslog in the sequence below after that the NCS reloaded automatic without any one caused this reload
shelfmgr[185]: %PLATFORM-CPA_INTF_SHELFMGR-3-HW_FAULT_RECOVERY : node0_RP0_CPU0: SEU Correctable error(intsts:0x18, sem_status:0x145)
processmgr[51]: %OS-SYSMGR-6-INFO : Prepared RMF to reboot
processmgr[51]: %MGBL-SCONBKUP-6-INTERNAL_INFO : Reload debug script successfully spawned
Solved! Go to Solution.
10-12-2022 11:13 AM
Device rebooted due to self-recovery mechanism invoked to rectify SEU Correctable errors. SEU (Single Event Upset) is a "soft" parity error events which is typically transient or random in nature, and usually only occur a single time as a result of an environmental disruption of the memory data.
These are not caused by hardware malfunction.
Research has shown that the majority of single event (or "soft") errors in memory chips occur as a result of background radiation (chiefly neutrons from cosmic rays), electro-static discharge (ESD), or electro-magnetic interference (EMI), which may randomly change the electrical state of one or more memory cells or interfere with the circuitry used to read & write them.
If you encounter soft parity errors, analyze recent environmental changes that have occurred at the location of the affected system.
Common sources of ESD and EMI that may cause soft parity errors include:
Power cables and supplies
Power distribution units
Universal power supplies
Lighting systems
Power generators
Nuclear facilities (radiation)
Solar flares (radiation)
In the event that the box is reloading multiple times then we can consider an RMA.
Sam
10-12-2022 06:32 AM
Had the same issue on a N540X-4Z14G2Q-A happen last night. Did you ever open a case with Cisco?
10-12-2022 11:13 AM
Device rebooted due to self-recovery mechanism invoked to rectify SEU Correctable errors. SEU (Single Event Upset) is a "soft" parity error events which is typically transient or random in nature, and usually only occur a single time as a result of an environmental disruption of the memory data.
These are not caused by hardware malfunction.
Research has shown that the majority of single event (or "soft") errors in memory chips occur as a result of background radiation (chiefly neutrons from cosmic rays), electro-static discharge (ESD), or electro-magnetic interference (EMI), which may randomly change the electrical state of one or more memory cells or interfere with the circuitry used to read & write them.
If you encounter soft parity errors, analyze recent environmental changes that have occurred at the location of the affected system.
Common sources of ESD and EMI that may cause soft parity errors include:
Power cables and supplies
Power distribution units
Universal power supplies
Lighting systems
Power generators
Nuclear facilities (radiation)
Solar flares (radiation)
In the event that the box is reloading multiple times then we can consider an RMA.
Sam
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide