cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
31287
Views
19
Helpful
7
Comments
Aleksandar Vidakovic
Cisco Employee
Cisco Employee

Introduction

ES+ linecards on Cisco 7600 Series Routers are using highly programmable components. Some of the issues observed on these cards had a symptom that would normally be interpretted as a hardware faiure, e.g. double-bit or repeated single-bit parity errors.

Purpose Of This Document

This documents provides an overview of known issues related to ES+ linecards on Cisco 7600 Serier Routers, with a twofold purpose:

  1. increase awareness of the fact that the old-time notion of what is a HW and what is a SW failure may not be applicable any more
  2. help Cisco customers and partner evaluate issues observed on ES+ card

This is not an exhaustive list. If your symptom does not match any of the ddtses listed in this document, please do make an additional search in the Bug Toolkit before opening a TAC Service Request.

Where To Look For Failure Symptoms

ES+ linecards have a local flash disk used for storing on-board logging data and for crashinfo files.

Locations where ES+ failure symptoms should be looked up are:

  • logging buffer on the active supervisor:       
    • relevant command: "show logging"
  • logging buffer on the ES+ linecard:      
    • relevant command: "remote command module <slot> show logging"
  • on-board logging file on the ES+ linecard's flash disk      
    • relevant command: "remote command module <slot> show logging onboard"
  • crashinfo and mini-crashinfo files on ES+ linecard's flash disk      
    • relevant command: "dir dfc#<slot>-bootdisk:", "more dfc#<slot>-bootdisk:<filename>"
    • NOTE: before running the "more" command, execute "terminal length 0"

On-board log may show isolated occurrence(s) of Single-bit parity errors. This should not be a concern becase:

  1. isolated single-bit parity errors can be considered soft-parity errors, caused by sources external to the memory chip
  2. ECC logic on ES+ linecards corrects single-bit errors

List Of Known Issues

    File created: 17 January 2013 10:24:28

    x40g: Failed to read register Id while reading NP registers val
    CSCsy88170C3c7600-es-platform

    Symptom: DFC3: ERROR! number: 0x80003902, NPprmReg_Read_NP_3c: register is not supported
    for NP-3c2.

    Conditions: Observed on the console or syslog of ES+ linecards of Cisco 7600 Series Routers.

    Workaround: None.

    Further Problem Description: Issue is cosmetic. Some registers are not meant to be read by the
    firmware on the chip. When the chip tries to read these registers, it prints the error.

    Traceback %X40G-DFC4-3-TCAM_MGR_HW_ERR: GTM HW ERROR
    CSCsz04660R3c7600-es-platform
    12.2(33.02.07)SRD 12.2(33)ZI 12.2(33)SRD03 12.2(32.08.19)REC186
    Symptom:
    On bootup or normal operations, a few ES+ cards might show the following traceback.

    %X40G-DFC4-3-TCAM_MGR_HW_ERR: GTM HW ERROR: TCAM device contains corrupted uninitialized data for channel


    Conditions:
    Observed on a small number of ES+ linecards of Cisco 7600 Series Routers.

    Workaround:
    None

    Further Problem Description:
    This message indicates that the TCAM consistency checker has detected a few TCAM entries that were not
    in the initialized states. The TCAM consistency checker has already corrected these TCAM entries.

    DBUS-HDR error in ES/ES+ Modules
    CSCtg31984R3c7600-es-platform
    15.1(00.03)S 15.0(01)S 15.0(00.13)S0.2 12.2(33.04.23)SRD 12.2(33)SRE02 12.2(33)SRD05 12.2(32.00.28)SRE

    Symptom:
    7600 with ES/ES+ module may report error EARL_L2_ASIC-DFC2-4-DBUS_HDR_ERR on
    after boot up. There is no function impact to the switch due to this error.

    Conditions:
    7600 with ES/ES+ modules present. The problem can happen up to a few hours
    after boot up.

    Workaround:
    No workaround. Problem has been resolved in 12.2(33)SRD5 and 12.2(33)SRE2.

    ES+ ECC_DOUBLE: Double-bit ECC error or reset due to eznp_ecc_err_isr
    CSCth11714R3c7600-es-platform
    15.1(00.09)S 15.0(01)S 15.0(00.13)S0.8 12.2(33.04.23)SRD 12.2(33)SRE02 12.2(33)SRD05 12.2(32.00.35)SRE
    Symptom:

    7600 Series router with ES+ line card crashes reporting error:

    %NP_DEV-DFC2-3-ECC_DOUBLE: Double-bit ECC error detected on NP 0, Mem 19,
    SubMem 0x1,SingleErr 1, DoubleErr 1 Count 1 Total 1

    Another possible symptom is:

    %PM_SCP-SP-1-LCP_FW_ERR: System resetting module 1 to recover from error: eznp_ecc_err_isr: ECC intr
    handler for NP: 1 failed



    Conditions:

    Symptom observed on ES+ linecard of C7600 series routers.


    Workaround:

    None.

    Further Problem Description:

    Software fix is available in :
    12.2(33)SRD5 or higher
    12.2(33)SRE2 or higher
    15.0(1)S or higher

    If symptom persists after IOS upgrade please contact Cisco TAC.

    Temperature 128 degC reported when sensor is Not_Operational
    CSCti80887R3c7600-es-platform
    15.1(01.04)S 15.1(01)S 15.1(00.18)S0.4 15.0(01)S02 15.0(00.13)S0.29 12.2(33.06.01)SRD 12.2(33.02.09)SRE 12.2(33)SRE03 12.2(33)SRD07
    Symptom:

    Faceplate LED on the linecard is red. Temperature sensor is reporting 128 degC.

    In addition, following I2C error may be reported by the linecard, confirming that the temperature sensor
    can not be read:

    I2C Read Error READ bus=0x1 addr=0x4D port_sel=0x0 flags = 0x0 cmd=0x0 size=2


    Conditions:

    Faulty sensor on a ES+ linecard of a C7600 Series Router.


    Workaround:

    None.


    Further Problem Description:

    This SW fix is correcting the reporting of an invalid sensor. Under same circumstances, 'NO' (Not Operational)
    will be reported instead of 128 degC.

    IOS fix for handling the Power calcuation issues with ES+ Combo cards
    CSCtn41667R3c7600-es-platform
    15.1(02.09)S 15.1(02)S01 15.1(02)S0.7 15.1(02)S0.6 15.0(01)S3.4 15.0(01)S04 12.2(33.03.06)SRE 12.2(33)SRE04
    Symptom:
    Following ES+ PIDS consume more power than the expected values.

    76-ES+XC-20G3C
    76-ES+XC-20G3CXL
    76-ES+XC-40G3C
    76-ES+XC-40G3CXL

    This might lead to situation of other modules getting powered down due to "power deny" .

    Conditions:
    Specific to ES+XC variants (Combo cards) of Cisco 7600 Series Routers.

    Workaround:
    Configure power redundancy-mode combined until the IOS is upgraded to a release with correct power
    settings.

    Fix LC inlet temp issue (ES+XC) and Alarm handling issues (All ES+)
    CSCtn68668R3c7600-es-platform
    15.1(02.12)S 15.1(02)S01 15.1(02)S0.6 15.0(01)S3.4 15.0(01)S04 12.2(33.03.09)SRE 12.2(33)SRE04
    Symptoms: The following symptoms are observed:

    1. The STATUS LED on the line card faceplate is amber.
    2. The remote command module module
    show platform hardware environment temperature command
    reports high line card inlet temperature:

    Router#remote command mod 1 show plat hard env temp

    ----------------------------------------------------------
    Temperature and Threshold Table
    ----------------------------------------------------------
    Sensor Minor Major Current
    ID Threshold Threshold Temperature
    ----------------------------------------------------------
    BB Outlet 0 60 75 47
    BB Inlet 0 50 65 27
    BB Outlet 1 75 85 54
    BB Inlet 1 50 65 32
    PE Outlet 60 75 53
    PE Inlet 50 65 34
    LC Outlet 60 75 49
    LC Inlet 50 65 50 <<<<<<<<


    Conditions: This issue is specific to the following Cisco 7600 ES+ combo
    cards:

    76-ES+XC-20G3C
    76-ES+XC-20G3CXL
    76-ES+XC-40G3C
    76-ES+XC-40G3CXL

    Line card inlet sensor is inappropriately positioned in a place where
    temperatures are higher than on the inlet point.

    Workaround: There is no workaround.

    Further Problem Description: There are no problems with the functioning of
    the board. Only the external communication is affected. "BB Inlet 1" shows
    the actual inlet temperature. It can be used for reliable measurement of line
    card inlet temperature.

    ES+: DEV_SELENE XAUI_LEN; FIFO_FULL; XAUI_GNT and XAUI_MIN errors
    CSCtq07626R3c7600-es-platform
    15.2(00.01)S 15.1(03)S 15.1(02.16)S0.3 15.1(02)S1.4 15.1(02)S02 15.0(01)S3.8 15.0(01)S04 12.2(33.06.01)SRD 12.2(33.03.17)SRE 12.2(33)SRE04 12.2(33)SRD07
    Symptom:
    Errors detected by selene ASIC:

    %DEV_SELENE-DFC1-3-XAUI_LEN
    %DEV_SELENE-DFC1-3-FIFO_FULL
    %DEV_SELENE-DFC1-3-XAUI_GNT
    %DEV_SELENE-DFC1-3-XAUI_MIN


    Conditions:
    Observed on ES+ linecards of Cisco 7600 Series Routers.


    Workaround:
    None.


    Further Problem Description:
    Listed error types are not HW failures. Instead of being reported through error messages, occurrence
    of these errors can be tracked through CLI: remote command module module show platform
    hardware drops
    .

    ES+: Watchdog resets fail to write crashinfo; causing Keep Alive failure
    CSCtr74953R3c7600-es-platform
    15.2(01)S 15.1(03)S01 15.1(03)S0.6 15.0(01)S4.13 15.0(01)S05 12.2(33.04.11)SRE 12.2(33)SRE05
    Symptom:
    %OIR-SP-3-PWRCYCLE: Card in module 1, is being power-cycled off (Module not responding to Keep Alive
    polling)
    %C7600_PWR-SP-4-DISABLED: power to module in slot 1 set off (Module not responding to Keep Alive polling)


    There is no crashifo file created.


    Conditions:
    Observed on ES+ linecards of Cisco 7600 Series Routers. This bug is specific to a condition where no
    other explanations exist for the failure of Keep Alive polling.

    Workaround:
    There is no workaround.

    Further Problem Description:
    This fix does not prevent the line card crash, but it prevents the silent crash. This fix ensures
    that a crashifo will be written on the ES+ line card flash disk. It also ensures that the line card is
    reset as soon as the error condition is detected, as opposed to waiting for a Keep Alive failure.

    ES+: PCI read hang causes Keep Alive failure; fails to write crashinfo
    CSCts25729R3c7600-es-platform
    15.2(01.02)S 15.2(01)S 15.2(00.18)S0.3 15.1(03)S1.3 15.1(03)S02 12.2(33.04.14)SRE 12.2(33)SRE05
    Symptom:

    %OIR-SP-3-PWRCYCLE: Card in module 1, is being power-cycled off (Module not responding to Keep Alive
    polling)
    %C7600_PWR-SP-4-DISABLED: power to module in slot 1 set off (Module not responding to Keep Alive polling)


    There is no crashifo file created.

    Conditions:
    Observed on ES+ linecards of Cisco 7600 Series Routers. This bug is specific to a condition where no
    other explanations exist for the failure of Keep Alive polling.

    Workaround:
    There is no workaround.

    Further Problem Description:
    This fix does not prevent the line card crash, but it prevents the silent crash. This fix ensures
    that a crashifo and mini crashinfo will be written on the ES+ line card flash disk. It also ensures that
    the line card is reset as soon as the error condition is detected, as opposed to waiting for a Keep Alive
    failure.

    ES+: Ingress traffic will not pass in some cases
    CSCtt13344R3c7600-es-platform
    15.2(01.11)S 15.2(01)S 15.2(00.18)S0.11 12.2(33.05.06)SRE 12.2(33)SRE06
    Symptom:

    Packet silently dropped.

    Conditions:

    Observed on 7600 Series Router ES+ linecards. There are no specific conditions for this. Symptom is more
    likely to be observed on very busy NPs with Gigabit interfaces (as opposed to TenGig), with MTU explicitly
    configured.

    Workaround:

    Available to TAC.

    Further Problem Description:

    Complete fix available through CSCty22112.

    ES+ not sending multicast traffic on EVC
    CSCty51740D3c7600-es-platform

    Symptom:
    Packet silently dropped.

    Conditions:
    Observed on 7600 Series Router ES+ linecards. There are no specific conditions for this. Symptom is more
    likely to be observed on very busy NPs with Gigabit interfaces (as opposed to TenGig), with MTU explicitly
    configured.

    Workaround:
    Available to TAC.

    ES+: Machine check exception crash (vector 200) (Ref: CSCub39296 also)
    CSCua51760R3c7600-es-platform
    15.3(00.15)S 15.1(03)S3.15 15.1(03)S04
    Symptom:

    Unexpected exception to CPU: vector 200, PC = 0x0

    Traceback decode is irrelevant.

    Conditions:

    Observed on ES+ series lincards of Cisco 7600 Series Routers.

    Workaround:

    There is no workaround.

    ES+: TCAM_MGR_HW_ERR: TCAM device had corrupted data errors
    CSCtc17311R2c7600-es-platform
    12.2(33.03.12)SRD 12.2(33.01.06)MCP07 12.2(33)SRE01 12.2(33)SRD04
    Symptoms: TCAM device is reporting corrupted data:

    %X40G-DFC2-3-TCAM_MGR_HW_ERR: GTM HW ERROR: TCAM device had corrupted data, the error is corrected for
    channel ...

    Conditions: Observed on ES+ linecards of Cisco 7600 Series Routers, by a background TCAM consistency
    checker.

    Workaround: There is no workaround.

    Further Problem Description: These messages can safely be ignored as the entries are already corrected.

    ES+: ECC_DOUBLE: Double-bit ECC error detected on NP - High T; Normal V
    CSCtd66014R2c7600-es-platform
    12.2(33.03.14)SRD 12.2(33.01.07)MCP07 12.2(33)ZI 12.2(33)SRE01 12.2(33)SRE00a 12.2(33)SRD04 12.2(32.00.11)SRE
    Symptoms: ES+ line card crashes at powerup of a Cisco 7600 router that is
    running Cisco IOS 12.2SRE image if either the Traffic Manager or Frame
    memories in the ES+ Network processors report a double bit ECC error. The ES+
    line card crashinfo will have the following string:

    %NP_DEV-DFC2-3-ECC_DOUBLE: Double-bit ECC error detected on NP 0, Mem 19,
    SubMem 0x1,SingleErr 1, DoubleErr 1 Count 1 Total 1

    Conditions: Router reloads, OIR of ES+ cards, system environment temperatures
    that slowly vary around an ambient temperature of about 30 degreesC. This
    happens at system powerup. We have seen double bit ECC problems reported
    after a few hours of traffic if the ambient temperatures vary around 30
    degreesC.

    Workaround: No configuration workaround is available. The line card will
    reset itself and will be operational in the second reload.

    ES+: ECC_SINGLE or ECC_DOUBLE error detected on NP
    CSCtd99244R2c7600-es-platform
    12.2(33.03.15)SRD 12.2(33.01.07)MCP07 12.2(33)ZI 12.2(33)SRE01 12.2(33)SRE00a 12.2(33)SRD04 12.2(32.00.11)SRE
    Symptoms:

    7600 series router with ES+ line card crashes reporting single bit or double bit ECC error.

    %NP_DEV-DFC2-3-ECC_SINGLE: Single-bit ECC error detected on NP 0, Mem 18, SubMem
    0x1,SingleErr 1, DoubleErr 0 Count 1 Total 1

    %NP_DEV-DFC2-3-ECC_DOUBLE: Double-bit ECC error detected on NP 0, Mem 19, SubMem
    0x1,SingleErr 1, DoubleErr 1 Count 1 Total 1

    Conditions:

    Symptom observed on ES+ linecard of C7600 series routers, usually in the initial phases of line card
    bootup, but this has also been reported after a few hours of traffic through the ES+ line card ports.


    Workaround:

    There is no workaround.

    Further Problem Description:

    Software fix is available in :
    12.2(33)SRD5 or higher
    12.2(33)SRE2 or higher
    15.0(1)S or higher

    If symptom persists after IOS upgrade please contact Cisco TAC.

    ES+: ECC_DOUBLE: Double-bit ECC error detected on NP
    CSCtd99248R2c7600-es-platform
    12.2(33.03.15)SRD 12.2(33.01.07)MCP07 12.2(33)ZI 12.2(33)SRE01 12.2(33)SRE00a 12.2(33)SRD04 12.2(32.00.11)SRE
    Symptoms:

    7600 series routers with ES+ line cards there could be occasional double bit ECC errors for the traffic
    manager and other metadata memories that are reported on the Network processor on the ES+ line card.


    Example error message:
    %NP_DEV-DFC9-3-ECC_DOUBLE: Double-bit ECC error detected on NP 3, Mem 18, SubMem 0x1,SingleErr 1, DoubleErr
    1 Count 1 Total 1


    Conditions:

    This symptom is observed when the router reloads, OIR of ES+ cards, system environment temperatures that
    slowly vary around an ambient temperature of about 30 degreesC. This happens at system power up. The
    double bit ECC errors reported after a few hours of traffic.

    Workaround: No configuration workaround is available. The line card will
    reset itself and will be operational in the second reload.


    Further Problem Description:

    Software fix is available in :
    12.2(33)SRD5 or higher
    12.2(33)SRE2 or higher
    15.0(1)S or higher

    If symptom persists after IOS upgrade please contact Cisco TAC.

    Invalid LinkFPGA or LINKFPGA Bus Error
    CSCte14535R2c7600-es-platform
    12.2(33.03.16)SRD 12.2(33.01.07)MCP07 12.2(33)SRE01 12.2(33)SRD04 12.2(32.00.13)SRE
    Symptom:

    Possible symptoms are:

    %FPD_MGMT-3-INVALID_IMG_VER: Invalid ... LinkFPGA .. image version detected for ... card in slot-dc ...

    %FPD_MGMT-6-UPGRADE_PASSED: ... LinkFPGA ... image in the ... card in slot-dc 7-2 has been successfully
    updated from version ?.? to version ...
    %C7600_ES-2-IOFPGA_IO_BUS_ERROR: C7600-ES Line Card IOFPGA IO LINKFPGA Bus Error

    Conditions:
    Observed during boot/reload of ES+ line card in Cisco 7600 Series Routers. Rare in normal working ES+
    cards.

    Workaround:
    This fix is an enhancement which adds an additional recovery cycle for reading the LinkFPGA.

    Further Problem Description:
    The link FPGA should recover in the next recovery reload of the ES+. If the recovery does not happen
    after 3 consecutive times, then a persistent hardware fault may be the reason. Contact TAC for RMA procedures.

    Low-queue ES+: ECC_DOUBLE: Double-bit ECC error detected on NP; Mem 16
    CSCth15790R2c7600-es-platform
    15.1(00.11)S 15.0(01)S 15.0(00.13)S0.9 12.2(33.04.17)SRD 12.2(33.03.25)SRD 12.2(33)SRE02 12.2(33)SRD05 12.2(32.00.36)SRE
    Symptoms:

    %NP_DEV-DFC9-3-ECC_DOUBLE: Double-bit ECC error detected on NP 3, Mem 16, SubMem
    0x1,SingleErr 1, DoubleErr 1 Count 1 Total 1

    Conditions:
    Symptom observed on Low-queue ES+ line cards (ES+T) of C7600 series routers, in NP Mem 16.

    Workaround:
    There is no workaround.

    Further Problem Description:
    If symptom persists after IOS upgrade please contact Cisco TAC.

    ENV-4-MINORTEMPALARM - updating the new temperature thresholds for ES+
    CSCth25959R2c7600-es-platform
    15.1(00.09)S 15.0(01)S 15.0(00.13)S0.9 12.2(33.04.16)SRD 12.2(33.03.25)SRD 12.2(33)SRE02 12.2(33)SRD05 12.2(32.00.35)SRE
    Symptom:
    Temperature alarm (ENV-4-MINORTEMPALARM) is reported, with AMBER LED on the line card faceplate.

    Conditions:
    7600 series router with any model of the ES+ line card.

    Workaround:
    No workaround.

    Further Problem Description:
    Temperature thresholds were set too low before this bug-fix . Correct settings are:

    --------------------------------------------
    Sensor Minor Major
    ID Threshold Threshold
    --------------------------------------------
    BB Outlet 0 65 80
    BB Outlet 1 70 85
    --------------------------------------------

    It is recommended to evaluate also the related bug CSCtn68668.

    Remove show platform hardware config-pld from show tech
    CSCti78408R2c7600-es-platform
    15.1(01.04)S 15.1(01)S 15.1(00.18)S0.4 15.0(01)S01 15.0(00.13)S0.24 12.2(33.04.29)SRD 12.2(33.02.05)SRE 12.2(33)SRE03 12.2(33)SRD05

    Symptoms:

    %SYS-DFC4-3-CPUHOG: Task is running for (128000)msecs, more than (2000)msecs (4/3),process = console_rpc_server_action.


    %SYS-DFC4-2-WATCHDOG: Process aborted on watchdog timeout, process = console_rpc_server_action.

    Conditions:
    This issue could hit in two conditions:
    1. Issuing "show tech" command on ES+
    2. Issuing "show platform hardware config-pld" on ES+

    Workaround:
    Should not use "show tech" and "show platform hardware config-pld" on ES+.

    ES+: LONGBUSYREAD: C2W Interface busy for long time reading temp sensor
    CSCtr74529R2c7600-es-platform
    15.2(00.15)S 15.1(03)S01 15.1(03)S0.4 15.1(02)S1.10 15.1(02)S02 15.0(01)S4.4 15.0(01)S05 12.2(33.06.01)SRD 12.2(33.04.08)SRE 12.2(33)SRE05 12.2(33)SRD07
    Symptoms:

    %ENVM-4-LONGBUSYREAD: C2W Interface busy for long time reading temperature sensor

    Conditions: Observed on ES+ linecard of Cisco 7600 Series Routers.

    Workaround: There is no workaround.

    Crash on ES+ on issuing show tech or show platform hardware version
    CSCtz30983R2c7600-es-platform
    15.3(00.01)S 15.2(02)S1.6 15.2(02)S02 15.1(03)S3.14 15.1(03)S04 15.0(01)S06 12.2(33.05.31)SRE 12.2(33)SRE07
    Symptoms: Crash on ES+ line card upon issuing the "show hw-module slot X tech-
    support" or "show platform hardware version" command. This is similar to
    CSCti78408 but not to CSCti78408.

    Conditions: This symptom occurs on an ES+ line card.

    Workaround: Do not issue the "show hw-module slot X tech-support" or "show
    platform hardware version" command on an ES line card unless explicitly
    mentioned by Cisco.

    Link FPGA Update Failures with Different signatures
    CSCth20868C2c7600-esm-20

    Symptom:

    ES+ card crashes with different failure messages during production. In Most of the cases the initial
    message for reload will be FPD upgrade failure for multiple attempts.

    The crash messages in this case will be different at different bootup attempts. These messages can be
    System Exception, FPD upgrade failure, IOFPGA bus error. Message Examples are

    Initial symptom would be:

    %FPD_MGMT-3-INVALID_IMG_VER: Invalid 20x1G LinkFPGA (FPD ID=7) image version detected for 7600-ES+20G
    card in slot-dc 7-2.

    IOFPGA bus error symptom:

    %C7600_ES-DFC7-2-IOFPGA_IO_BUS_ERROR: C7600-ES Line Card IOFPGA IO LINKFPGA Bus Error:

    and other system Exceptions.


    Conditions:

    Symptom observed during boot-up of 7600-ES+ linecards.


    Workaround:

    None.

    ES+ Bridge ASIC get locked up and stops forwarding
    CSCtz51545C2c7600-mpls

    Symptom:

    ES+ stops forwarding traffic

    Error messages like below will be seen on hitting this problem:

    *Mar 29 20:33:55.283: %FABRIC-SP-6-TIMEOUT_ERR: Fabric in slot 5 reported timeout error for channel 17
    (Module 9, fabric connection 1)no
    *Mar 29 20:33:55.507: %FABRIC_INTF_ASIC-DFC9-5-FABRICSYNC_DONE: Fabric ASIC 1 Channel 1: Fabric sync
    do

    Conditions:

    1) VRF configurations in conjunction with any of the below: (And "mls mpls recir-agg" not configured)


    i) core paths are enabled with MPLS TE-FRR (or) BGP PIC (or) IP-FRR
    ii) Presence of Distributed ether channel with SVI combination towards VRF access.

    2) Presence of GREoMPLS cases with any of the below: (And mls mpls tunnel recir is not configured)
    i) core paths are enabled with MPLS TE-FRR (or) BGP PIC (or) IP-FRR

    Issue is not seen always, but seen transiently sometimes with the above configurations due to some transient
    conditions.

    Workaround:

    If it's condition 1, enable "mls mpls recir-agg",
    If it's condition 2, enable "mls mpls tunnel-recir"

    If both of the above does not help solve traffic blackholing, auto recovery can be enabled using the
    cli "hw-module slot mp-recovery-enable".

    The above cli will be available from 15.2(02.18)S release.

    ES+ ROMMON: MPC8548 DDR20 errata fix for Multi-bit ECC errors
    CSCtb76621R3c7600-system
    12.2(33r)SRD07 12.2(32.08.27)REC186
    Symptom:

    %C6K_MEM_ECC-DFCx-2-MBE: Multiple bit error detected at ...
    %C6K_MEM_ECC-DFCx-3-SYNDROME_MBE: 8-bit Syndrome for the detected Multi-bit error: ...
    %C7600_MEM_ECC-DFCx-2-MBE: Multiple bit error detected at ...
    %C7600_MEM_ECC-DFCx-3-SYNDROME_MBE: 8-bit Syndrome for the detected Multi-bit error: ...

    Conditions:

    Observed on ES+ line card of Cisco 7600 Series Router.

    Workaround:

    There is no workaround.

    Further Problem Description:
    This fix is integrated in the 12.2(33r)SRD7 ROMMON image for ES+ card. SRD7 rommon image is bundled into
    IOS package for Cisco 7600 Series Router starting from 15.0(1)S. Cisco 7600 Series Routers running an
    image from 12.2(33)SRD or 12.2(33)SRE version may also run SRD7 rommon. If affected by this issue, contact
    Cisco TAC and request the 12.2(33r)SRD7 image. Please refer this link for the rommon upgrade procedure:
    http://www.cisco.com/en/US/docs/routers/7600/rommon/rsp720_rommon.html#wp180816

    ES+ Metropolis lock-up: Recovery fix.
    CSCty93833R3c7600-system
    15.2(02.18)S 15.2(02)S1.9 15.2(02)S02 15.1(03)S3.14 15.1(03)S04 12.2(33.06.09)SRE 12.2(33)SRE07
    Symptom:
    Metropolis on ES+ line card gets stuck (stops processing packets) leading to traffic loss and in turn
    service impact.

    Conditions:
    Problem to surface, all of the following conditions should be present.
    1.Re-write instruction from Earl *must* involve removal as well as addition of a few bytes.
    2.The instruction to remove bytes *must* occur as a result of look-up in the VPN CAM (an internal block
    within Earl)
    3.The packet size *must* be such that the re-write causes the packet to spill-over into the next line
    (thus over-writing start of next packet)
    4.The re-write instruction *must* also force the packet to be re-circulated.

    All four conditions above *must* be satisfied for the lock-up to occur.

    Workaround:
    1. Eliminating any of the above conditions will solve the problem.
    2. Apply earl-patch to recover.

    ES+ : Distinguish Earl Inlet and outlet temp reading during error case
    CSCua78310D3c7600-system

    Symptom:
    EARL Inlet and Outlet Sensor on ES+ card report the same temperature reading.

    Conditions:
    None.

    Workaround:
    None.

    [obselete]ES+ : Incorrect temperature Displayed in EARL INLET/OUTLET
    CSCua78386R3c7600-system
    15.3(01)S 15.3(00.18)S
    Symptom:
    Incorrect temperature Displayed in EARL INLET/OUTLET
    Temperature displayed may stuck at the erroneous value .

    Conditions:

    Occurs on ES+ LC.
    Seems to be due to some Spurious write happening in the Sensor device register which may cause the issue.


    Workaround:
    None.

    ES+ : Incorrect temperature Displayed in EARL INLET/OUTLET
    CSCub74451R3c7600-system
    15.3(01.01)S 15.2(04)S01 15.2(04)S0.5 12.2(33.07.02)SRE 12.2(33)SRE07a
    Symptoms: EARL inlet/outlet displays incorrect temperature values. If the
    temperature crosses the minor/major threshold false alarms will be generated.
    In case of a major alarm the linecard will shut down as a preventive measure.

    Conditions: There is no trigger for the issue.

    Workaround: Reload the linecard.

    Initialize TM external frame and control memories in ES+
    CSCtq85884R3c7600-system
    15.2(02.09)S 15.2(02)SNG 15.2(02)S01 15.2(02)S0.1 15.2(01)S02 15.1(03)S2.18 15.1(03)S03 15.1(03)MR 15.0(01)S5.10 15.0(01)S06 12.2(33.05.26)SRE 12.2(33)SRE07
    Symptom:
    ECC double bit error followed with a line card crash. Sample symptom:

    %NP_DEV-DFC5-3-ECC_DOUBLE: Double-bit ECC error detected on NP 1, Mem 17, SubMem 0x1,SingleErr 1, DoubleErr
    1 Count 1 Total 1

    Condition:
    Reported by ES+ line card on Cisco 7600 Series Routers, during the boot of the linecard.

    Workaround:
    None.

    Further Problem Description:
    This fix is introducing an additional memory check during line card boot. As a consequence, it may expose
    during line card boot some previously undetected failures. These failures would otherwise restart the
    line card during normal operation.

    ES+: silent packet drops
    CSCty22112R3c7600-system
    15.2(02.14)S 15.2(02)S01 15.2(02)S0.1 15.2(01)S1.6 15.2(01)S02 15.1(03)S2.16 15.1(03)S03 15.0(01)S5.14 15.0(01)S06 12.2(33.05.25)SRE 12.2(33)SRE06
    Symptom:
    Packet silently dropped.

    Conditions:
    Observed on 7600 Series Router ES+ linecards. There are no specific conditions
    for this. Symptom is more likely to be observed on very busy NPs with Gigabit
    interfaces (as opposed to TenGig), with MTU explicitly configured.

    Workaround:
    Please refer to Note to TAC enclosure for the workaround

    ES+: FABRICCRCERRS after SSO due to Metropolis lockup
    CSCto55567R2c7600-system
    15.2(00.06)S 15.1(03)S 15.1(02.16)S0.4 15.0(01)S3.12 15.0(01)S04 12.2(33.04.05)SRE 12.2(33)SRE05
    Symptoms: line card reports fabric errors:

    %FABRIC_INTF_ASIC-DFC9-4-FABRICCRCERRS: Fabric ASIC 0: 322 Fabric CRC error events in 100ms period

    Also, TestMacNotification and TestFabricCh0Health diagnostic tests are failing.


    Conditions: Symptom is observed on ES+ line cards of C7600 Series Routers after SSO with multicast traffic
    flowing through the line card.

    Workaround: Soft reload the line card using the hw-module module module reset exec
    command.

    ES+: single occurrence of DEV_SELENE XAUI_CODE error
    CSCtr37182R2c7600-system
    15.2(00.12)S 15.1(03)S 15.1(02.16)S0.13 15.0(01)S3.14 15.0(01)S04 12.2(33.06.01)SRD 12.2(33.04.07)SRE 12.2(33)SRE05 12.2(33)SRD07
    Symptoms: Single occurrence of XAUI_CODE and XAUI_RX_RDY message in the syslog:

    %DEV_SELENE-DFC1-3-XAUI_CODE: Selene 1 XAUI 1 Coding Error
    %DEV_SELENE-DFC1-3-XAUI_RX_RDY: Selene 1 XAUI 1 Rx Rdy changed state

    Conditions: This symptom is observed on ES+ linecards of Cisco 7600 series router.

    Workaround: There is no workaround.

    Further Problem Description: Single occurrence of this error can safely be ignored.

    ISSU Standby reset due to MCL failure "hw-module slot1reset-recycle-bu"
    CSCtu30649R2c7600-system
    15.2(01.15)S 15.2(01)S 15.2(00.18)S0.13 15.1(03)S1.7 15.1(03)S02 15.0(01)S4.20 15.0(01)S05 12.2(33.05.09)SRE 12.2(33)SRE06
    Symptoms: Standby is reset.

    Conditions: This issue is seen when the ISSU standby is reset because of MCL
    failure.

    Workaround: There is no workaround.

    Enhance the temperature alarm detection logic on ES+ cards
    CSCtz43626R2c7600-system
    15.3(00.18)S 15.2(04)S01 15.2(04)S0.5 15.1(03)S4.14 12.2(33.07.02)SRE 12.2(33)SRE07a
    Symptoms: Minor or major temperature alarms reported in the syslog:

    %C7600_ENV-SP-4-MINORTEMPALARM: module 2 aux-1 temperature crossed threshold
    #1(=60C). It has exceeded normal operating temperature range.

    %C7600_ENV-SP-4-MINORTEMPALARM: EARL 2/0 outlet temperature crossed threshold
    #1(=60C). It has exceeded normal operating temperature range.

    Conditions: The symptom is observed on ES+ series linecards of Cisco 7600
    series routers. Specifically, the reported temperature will be far off from
    reading of other sensors on the linecard.

    Workaround: There is no workaround.

    ES+: Machine check exception crash (vector 200)
    CSCub39296R2c7600-system
    15.3(01.02)S 15.3(01)S0.7 15.3(01)S 15.3(00.20)S0.3 15.2(04)S1.3 15.2(04)S02 15.1(03)S4.12 12.2(33.07.02)SRE 12.2(33)SRE07a
    Symptoms: Unexpected exception to CPU: vector 200, PC = 0x0. Traceback decode
    is irrelevant.

    Conditions: The symptom is observed on the ES+ series linecards on a Cisco 7600
    series router. Symptom is reported on the ES+ console and in the crashinfo file
    on the ES+ flash disk. It is not reported in the syslog.

    Workaround: There is no workaround.

    ES+: ECC_DOUBLE: Double-bit ECC error detected on NP; Mem 17
    CSCtn95122R2c7600-system
    15.2(00.09)S 15.1(03)S 15.1(02.16)S0.9 15.0(01)S3.12 15.0(01)S04 12.2(33.04.07)SRE 12.2(33)SRE05
    Symptoms: The ECC double-bit error is reported in syslog, followed with a linecard crash:

    %NP_DEV-DFC5-3-ECC_DOUBLE: Double-bit ECC error detected on NP ... Mem 17

    Conditions: Observed on ES+ linecards of C7600 Series Routers when heavy configuration changes are applied
    to the linecard. In addition, there are other unknown race conditions that can cause this. This bug-fix
    is specific to Double-bit errors on Mem 17.

    Workaround: There is no workaround.

Comments

Hello Aleksandar,

One of the customers is experiencing the issue discribed in CSCsw31515. Is it possible that the root cause is h/w? How can you make the difference between h/w and transient s/w issues?

Thanks, kind regards,

Eduard

Aleksandar Vidakovic
Cisco Employee
Cisco Employee

Hi Eduard,

I have tried to reply via email back in Nov but seems the email didn't get through.

I suppose you are referring to ECC double-bit errors. There is no direct way to distinguish between a HW and SW issue. As a general recommendation I can only ask you to upgrade to an IOS release which has all the relavent fixes included. If your customer has experienced repeated ECC errors on multiple linecards, please do open a Service Request.

Regards,

Aleksandar

lukas.tribus
Level 1
Level 1

Also related:

CSCud19230

Symptoms: ES+ line card reload occurs with the following error messages:

%PM_SCP-SP-1-LCP_FW_ERR: System resetting module 2 to recover from error: x40g_iofpga_interrupt_handler: LINKFPGA IOFPGA IO Bus Err val: 4214784 Bus Error Add:332 Bus Err data: 0

%OIR-SP-3-PWRCYCLE: Card in module 2, is being power-cycled Off (Module Reset due to exception or user request)

%C7600_PWR-SP-4-DISABLED: power to module in slot 2 set Off (Module Reset due to exception or user request)

Conditions: This symptom is observed with the ES+ line card.

Workaround: There is no workaround.

Fixed in 12.2(33)SRE8 and recent 15.x rebuilds.

smailmilak
Level 4
Level 4

Hi Aleksandar,

 

we had following issue.

C7609 running 12.2.33 SRE7a and multiple ES+ LC's had a failure on 7600-ES20-GE3CXL where 

the first 10 port did not forward any traffic. A reload solved the issue.

It looks like that the first 10XGE Port had a problem. 

I could not find any onboard logs or errors.

What could be the issue? 

lukas.tribus
Level 1
Level 1

Seems like an issue related to one NP. You could have just reset or reseat that particular linecard instead of reloading the chassis.

 

Anyway, your card is EOL/EOS with End of SW Maintenance dating back to March, 2012. Afaik, those EOL'ed ES (non-plus) cards have at least one unfixed serious software issue (crashes). I would throw them out as soon as possible.

http://www.cisco.com/c/en/us/products/collateral/routers/7600-series-routers/eol_c51_577514.html

 

smailmilak
Level 4
Level 4

Hi Lukas,

 

we did a reload of the LC, not chassis. We had similar problems on other hardware and a reload is always performed because it helps most of the time and it's much faster then typing show commands.

 

It would be nice when I could find a bug that is matching this issue, even when it's after March 2012.

 

 

fly
Level 2
Level 2

Dear  Aleksandar,

     we are using  76-ES+T-4TG  card in slot 9.

     9/1 interface facing core router , 9/3 facing customer, we config a 900 entries ACL on interface 9/1 incoming,  and protect our customers from attacks, 

     now we found some problem, we ping directly connected Layer3 neighbor on 7609 is ok.

    but we ping customers under 9/3 interface from upper layer 3 core router , it lost packets. we found packet arrived at 9/1 interface and not send out 9/3 interface, we ping 100 packet, only found 91 packets send out 9/3 interface to customers.

  ios  version is sup-bootdisk:c7600rsp72043-advipservices-mz.122-33.SRD4.bin

   thank you

Tom

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco