cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2962
Views
10
Helpful
1
Replies

Nexus N5K-C5596T crashed due to SUNNYVALE ASIC FAILURE

AVRAHAM NINO
Level 1
Level 1

Hello,

I have a Nexus N5K-C5596T that crashed a few hours ago due to SUNNYVALE ASIC FAILURE.

Can't find any documentation on it.

 sh system reset-reason
----- reset reason for Supervisor-module 1 (from Supervisor in slot 1) ---
1) At 743926 usecs after Sun Aug 26 05:16:57 2018
    Reason: Reset performed due to component Error
    Service: SUNNYVALE ASIC FAILURE
    Version: 7.3(3)N1(1)

    Reason: Unknown
    Service:
    Version: 7.3(3)N1(1)

Appreciate any help prior to opening TAC case

 

1 Reply 1

AVRAHAM NINO
Level 1
Level 1

TAC Answer:

This crash can be transient problem. An ASIC had a fatal error which can be caused by transient environmental triggers, such as causes for parity errors:
Please keep this switch under monitoring. If we see similar crash again, switch must be replaced.
If you want to know more about parity errors (hardware fatal error triggers), please see below:

Soft Errors
Most parity errors are caused by electrostatic or magnetic-related environmental conditions.
The majority of single-event errors in memory chips are caused by background radiation (such as neutrons from cosmic rays), electromagnetic interference (EMI), or electrostatic discharge (ESD). These events may randomly change the electrical state of one or more memory cells or may interfere with the circuitry used to read and write memory cells.
Known as soft parity errors, these events are typically transient or random and usually occur once. Soft errors can be minor or severe:
Minor soft errors that can be corrected without component reset are single event upsets (SEUs).
Severe soft errors that require a component or system reset are single event latchups (SELs).
Soft errors are not caused by hardware malfunction; they are transient and infrequent, are mostly likely a SEU, and are caused by an environmental disruption of the memory data.

If you encounter soft parity errors, analyze recent environmental changes that have occurred at the location of the affected system. Common sources of ESD and EMI that may cause soft parity errors include:

- Power cables and supplies
- Power distribution units
- Universal power supplies
- Lighting systems
- Power generators
- Nuclear facilities (radiation)
- Solar flares (radiation)
Hard Errors
Other parity errors are caused by a physical malfunction of the memory hardware or by the circuitry used to read and write memory cells.
Hardware manufacturers take extensive measures to prevent and test for hardware defects. However, defects are still possible; for example, if any of the memory cells used to store data bits are malformed, they may be unable to hold a charge or may be more vulnerable to environmental conditions.

Similarly, while the memory itself may be operating normally, any physical or electrical damage to the circuitry used to read and write memory cells may also cause data bits to be changed during transfer, which results in a parity error.

Known as hard parity errors, these events are typically very frequent and repeated and occur whenever the affected memory or circuitry is used. The exact frequency depends on the extent of the malfunction and how frequently the damaged equipment is used.
Remember that hard parity errors are the result of a hardware malfunction and reoccur whenever the affected component is used.
If you encounter hard parity errors, analyze physical changes that have occurred at the location of the affected system. Common sources of hardware malfunction that may lead to hard parity errors include:

- Power surges (no ground)
- ESD
- Overheating or cooling
- Incorrect or partial installation
- Component incompatibility
- Manufacturing defect

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: