cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1149
Views
0
Helpful
1
Replies

Nexus 7010 - N7K-M108X2-12L module reloaded by itself - LTL parity error

jgeralsky
Level 1
Level 1

Hello All,

The module N7K-M108X2-12L was reloaded by itself.

This was the first time and everything is working fine after the reload.
The error message was generated during the reload :

 

2015 Sep 15 09:03:09 kb1_MGMT %MODULE-2-MOD_DIAG_FAIL: Module 7 reported
failure Ethernet7/7-8due to LTL parity error interrupt in device DEV_METROPOLIS (device error 0xc4f0320b)

 

kb1_MGMT# show module

Mod  Ports  Module-Type                                         Model                        Status
---  -----  ----------------------------------- ------------------ ----------
4    48     1000 Mbps Optical Ethernet XL Modul        N7K-M148GS-11L     ok
5    0      Supervisor Module-2                                     N7K-SUP2            active *
6    0      Supervisor Module-2                                    N7K-SUP2            ha-standby
7    8      10 Gbps Ethernet XL Module                       N7K-M108X2-12L     ok
8    24     10 Gbps Ethernet Module                           N7K-M224XP-23L     ok

Mod  Sw              Hw
---  --------------  ------
4    6.2(12)         1.4     
5    6.2(12)         3.0     
6    6.2(12)         1.0     
7    6.2(12)         1.6     
8    6.2(12)         2.1     

Mod  Online Diag Status
---  ------------------
4    Pass
5    Pass
6    Pass
7    Pass
8    Pass

Xbar Ports  Module-Type                         Model              Status
---  -----  ----------------------------------- ------------------ ----------
1    0      Fabric Module 2                     N7K-C7010-FAB-2    ok
2    0      Fabric Module 2                     N7K-C7010-FAB-2    ok
3    0      Fabric Module 2                     N7K-C7010-FAB-2    ok

 

kb1_MGMT# show module internal exceptionlog module 7 

********* Exception info for module 7 ********

exception information --- exception instance 1 ----
Module Slot Number: 7
Device Id         : 79
Device Name       : Metropolis
Device Errorcode  : 0xc4f0320b
Device ID         : 79 (0x4f)
Device Instance   : 03 (0x03)
Dev Type (HW/SW)  : 02 (0x02)
ErrNum (devInfo)  : 11 (0x0b)
System Errorcode  : 0x41140051 LTL parity error interrupt
Error Type        : FATAL error
PhyPortLayer      : Ethernet
Port(s) Affected  : Ethernet7/7-8
Error Description : ME_KR_INTR_EC_CP_INT_3_LTL_PARITY_ERR
DSAP              : 0 (0x0)
UUID              : 0 (0x0)
Time              : Tue Sep 15 09:03:09 2015
                    (Ticks: 55F7C2AD jiffies) 

 

Nexus 7000 is running on Software: 

  BIOS:      version 2.12.0
  kickstart: version 6.2(12)
  system:    version 6.2(12)

 

Nexus 7000 Hardware :
  cisco Nexus7000 C7010 (10 Slot) Chassis ("Supervisor Module-2")
  Intel(R) Xeon(R) CPU         with 12224948 kB of memory.
  Processor Board ID xxxxxxxxx

 

I opened Cisco case (SR 636328821). 

 

1 Reply 1

jgeralsky
Level 1
Level 1

Hello All,

here is the root cause from Cisco : 

 

From the logs we have it seen the module in slot 7 had rebooted due to a parity error. 

2015 Sep 15 09:03:09 kb1_MGMT %MODULE-2-MOD_DIAG_FAIL: Module 7 reported failure Ethernet7/7-8due to LTL parity error interrupt in device DEV_METROPOLIS (device error 0xc4f0320b)

 

A parity error occurs when a bit is changed from its original value (0 or a 1) to the opposite value. This problem can occur as one of two different types of parity errors, soft parity errors or hard parity errors.

 

1) Soft parity errors (Single Event Upset-SEU):

All computer and network systems are susceptible to the rare occurrence of Single Event Upsets (SEU), sometimes described as parity errors. These single bit errors occur when a bit in a data word changes unexpectedly due to external events (thus causing, for example, a zero to spontaneously change to a one). SEUs are a universal phenomenon irrespective of vendor and technology. SEUs occur very infrequently, but all computer and network systems, even a PC, are subject to them. SEUs are also called soft errors,which are caused by noise and result in a transient, inconsistent error in the data, and is unrelated to a component failure.

 

2) Hard parity errors (Repeated errors):

These errors occur when there is a chip or board failure that corrupts data. A hard error is caused by a failed component, or a board-level problem such as improperly manufactured printed circuit board that results in repeated occurrences of the same error. In this case, you need to re-seat or replace the affected component, which usually involves a memory chip swap or a board swap. There is a hard parity error when multiple parity errors occur at the same address. There are more complicated cases that are harder to identify. In general, if you see more than one parity error in a particular memory region in a relatively short period, you can consider it to be a hard parity.

In this case we have seen this only in a single incident and the issue is not hardware related and seen be corrected by itself by the reload. So no further action required at this time and you can monitor the device and report any further issues seen.

 

 

Review Cisco Networking products for a $25 gift card