cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2453
Views
0
Helpful
3
Replies

%SATCTRL-FEX101-4-SOHMS_DIAG_WARN: FEX-101 Module 1: Runtime diag detected minor event: Correctable ECC errors

Asim Afzal
Level 1
Level 1

Hi ,

I am getting below error in the logs .There is no impact by i keep getting this alert in the logs


AEAUH-INJDC-01NB01-N5KSFSW01 %SATCTRL-FEX101-4-SOHMS_DIAG_WARN: FEX-101 Module 1: Runtime diag detected minor event: Correctable ECC errors <dev=0, count=3>

Below is show version

Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Documents: http://www.cisco.com/en/US/products/ps9372/tsd_products_support_series_home.html
Copyright (c) 2002-2013, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained herein are owned by
other third parties and are used and distributed under license.
Some parts of this software are covered under the GNU Public
License. A copy of the license is available at
http://www.gnu.org/licenses/gpl.html.

Software
BIOS: version 3.6.0
loader: version N/A
kickstart: version 6.0(2)N2(1)
system: version 6.0(2)N2(1)
Power Sequencer Firmware:
Module 1: version v1.0
Module 2: version v1.0
Microcontroller Firmware: version v1.0.0.1
SFP uC: Module 1: v1.1.0.0
QSFP uC: Module not detected
BIOS compile time: 05/09/2012
kickstart image file is: bootflash:///n5000-uk9-kickstart.6.0.2.N2.1.bin
kickstart compile time: 7/24/2013 3:00:00 [07/24/2013 14:49:21]
system image file is: bootflash:///n5000-uk9.6.0.2.N2.1.bin

3 Replies 3

Mark Malone
VIP Alumni
VIP Alumni

Hi

that usually means a memory errors

check sh diagnostic result fex 101...see if it returns ok status

xxxxxxxxxxx# sh diagnostic result fex 101
FEX-101: Fabric Extender 48x1GE + 4x10G Module  SerialNo   : xxxxxxxxxxx
Overall Diagnostic Result for FEX-101  : OK

Test results: (. = Pass, F = Fail, U = Untested)
TestPlatform:
0)              SPROM: ---------------> .
1)   Inband interface: ---------------> .
2)                Fan: ---------------> .
3)       Power Supply: ---------------> .
4) Temperature Sensor: ---------------> .

TestForwardingPorts:
Eth    1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Port ------------------------------------------------------------------------
       .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .

Eth   25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
Port ------------------------------------------------------------------------
       .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .

TestFabricPorts:
Fabric 1  2  3  4
Port ------------
       .  .  .  .

greencolin
Level 1
Level 1

Hi,

have you found  solution to this issue?

I have similar errors in log and  suspect memory/cpu problems.

I have run sh processes cpu history on fex and found high workload.

Arun Yadav
Cisco Employee
Cisco Employee

Hi Asim,


This is not a software issue. This error means that a single-bit ECC correction (error correction) was made on FEX SDRAM memory.
It is harmless because hardware was able to correct the memory error via ECC. There's a counter that tracks these corrections:

 

prt> show new_ints


| SS9 : ssx_int_err_ecc1 |
|--+---------+----------------------------------+
|6 |000000050 | single-bit ECC error | main memory bank 1
|-----------------------------------------------|

 

If this is recurring then FEX should be replaced.

Review Cisco Networking for a $25 gift card