03-06-2012 09:14 AM - edited 03-01-2019 04:46 PM
ES+ linecards on Cisco 7600 Series Routers are using highly programmable components. Some of the issues observed on these cards had a symptom that would normally be interpretted as a hardware faiure, e.g. double-bit or repeated single-bit parity errors.
This documents provides an overview of known issues related to ES+ linecards on Cisco 7600 Serier Routers, with a twofold purpose:
This is not an exhaustive list. If your symptom does not match any of the ddtses listed in this document, please do make an additional search in the Bug Toolkit before opening a TAC Service Request.
ES+ linecards have a local flash disk used for storing on-board logging data and for crashinfo files.
Locations where ES+ failure symptoms should be looked up are:
On-board log may show isolated occurrence(s) of Single-bit parity errors. This should not be a concern becase:
File created: 17 January 2013 10:24:28
x40g: Failed to read register Id while reading NP registers val | |||
CSCsy88170 | C | 3 | c7600-es-platform |
Symptom: DFC3: ERROR! number: 0x80003902, NPprmReg_Read_NP_3c: register is not supported for NP-3c2. Conditions: Observed on the console or syslog of ES+ linecards of Cisco 7600 Series Routers. Workaround: None. Further Problem Description: Issue is cosmetic. Some registers are not meant to be read by the firmware on the chip. When the chip tries to read these registers, it prints the error. |
Traceback %X40G-DFC4-3-TCAM_MGR_HW_ERR: GTM HW ERROR | |||
CSCsz04660 | R | 3 | c7600-es-platform |
12.2(33.02.07)SRD 12.2(33)ZI 12.2(33)SRD03 12.2(32.08.19)REC186 | |||
Symptom: On bootup or normal operations, a few ES+ cards might show the following traceback. %X40G-DFC4-3-TCAM_MGR_HW_ERR: GTM HW ERROR: TCAM device contains corrupted uninitialized data for channel Conditions: Observed on a small number of ES+ linecards of Cisco 7600 Series Routers. Workaround: None Further Problem Description: This message indicates that the TCAM consistency checker has detected a few TCAM entries that were not in the initialized states. The TCAM consistency checker has already corrected these TCAM entries. |
DBUS-HDR error in ES/ES+ Modules | |||
CSCtg31984 | R | 3 | c7600-es-platform |
15.1(00.03)S 15.0(01)S 15.0(00.13)S0.2 12.2(33.04.23)SRD 12.2(33)SRE02 12.2(33)SRD05 12.2(32.00.28)SRE | |||
Symptom: 7600 with ES/ES+ module may report error EARL_L2_ASIC-DFC2-4-DBUS_HDR_ERR on after boot up. There is no function impact to the switch due to this error. Conditions: 7600 with ES/ES+ modules present. The problem can happen up to a few hours after boot up. Workaround: No workaround. Problem has been resolved in 12.2(33)SRD5 and 12.2(33)SRE2. |
ES+ ECC_DOUBLE: Double-bit ECC error or reset due to eznp_ecc_err_isr | |||
CSCth11714 | R | 3 | c7600-es-platform |
15.1(00.09)S 15.0(01)S 15.0(00.13)S0.8 12.2(33.04.23)SRD 12.2(33)SRE02 12.2(33)SRD05 12.2(32.00.35)SRE | |||
Symptom: 7600 Series router with ES+ line card crashes reporting error: %NP_DEV-DFC2-3-ECC_DOUBLE: Double-bit ECC error detected on NP 0, Mem 19, SubMem 0x1,SingleErr 1, DoubleErr 1 Count 1 Total 1 Another possible symptom is: %PM_SCP-SP-1-LCP_FW_ERR: System resetting module 1 to recover from error: eznp_ecc_err_isr: ECC intr handler for NP: 1 failed Conditions: Symptom observed on ES+ linecard of C7600 series routers. Workaround: None. Further Problem Description: Software fix is available in : 12.2(33)SRD5 or higher 12.2(33)SRE2 or higher 15.0(1)S or higher If symptom persists after IOS upgrade please contact Cisco TAC. |
Temperature 128 degC reported when sensor is Not_Operational | |||
CSCti80887 | R | 3 | c7600-es-platform |
15.1(01.04)S 15.1(01)S 15.1(00.18)S0.4 15.0(01)S02 15.0(00.13)S0.29 12.2(33.06.01)SRD 12.2(33.02.09)SRE 12.2(33)SRE03 12.2(33)SRD07 | |||
Symptom: Faceplate LED on the linecard is red. Temperature sensor is reporting 128 degC. In addition, following I2C error may be reported by the linecard, confirming that the temperature sensor can not be read: I2C Read Error READ bus=0x1 addr=0x4D port_sel=0x0 flags = 0x0 cmd=0x0 size=2 Conditions: Faulty sensor on a ES+ linecard of a C7600 Series Router. Workaround: None. Further Problem Description: This SW fix is correcting the reporting of an invalid sensor. Under same circumstances, 'NO' (Not Operational) will be reported instead of 128 degC. |
IOS fix for handling the Power calcuation issues with ES+ Combo cards | |||
CSCtn41667 | R | 3 | c7600-es-platform |
15.1(02.09)S 15.1(02)S01 15.1(02)S0.7 15.1(02)S0.6 15.0(01)S3.4 15.0(01)S04 12.2(33.03.06)SRE 12.2(33)SRE04 | |||
Symptom: Following ES+ PIDS consume more power than the expected values. 76-ES+XC-20G3C 76-ES+XC-20G3CXL 76-ES+XC-40G3C 76-ES+XC-40G3CXL This might lead to situation of other modules getting powered down due to "power deny" . Conditions: Specific to ES+XC variants (Combo cards) of Cisco 7600 Series Routers. Workaround: Configure power redundancy-mode combined until the IOS is upgraded to a release with correct power settings. |
Fix LC inlet temp issue (ES+XC) and Alarm handling issues (All ES+) | |||
CSCtn68668 | R | 3 | c7600-es-platform |
15.1(02.12)S 15.1(02)S01 15.1(02)S0.6 15.0(01)S3.4 15.0(01)S04 12.2(33.03.09)SRE 12.2(33)SRE04 | |||
Symptoms: The following symptoms are observed: 1. The STATUS LED on the line card faceplate is amber. 2. The remote command module module show platform hardware environment temperature command reports high line card inlet temperature: Router#remote command mod 1 show plat hard env temp ---------------------------------------------------------- Temperature and Threshold Table ---------------------------------------------------------- Sensor Minor Major Current ID Threshold Threshold Temperature ---------------------------------------------------------- BB Outlet 0 60 75 47 BB Inlet 0 50 65 27 BB Outlet 1 75 85 54 BB Inlet 1 50 65 32 PE Outlet 60 75 53 PE Inlet 50 65 34 LC Outlet 60 75 49 LC Inlet 50 65 50 <<<<<<<< Conditions: This issue is specific to the following Cisco 7600 ES+ combo cards: 76-ES+XC-20G3C 76-ES+XC-20G3CXL 76-ES+XC-40G3C 76-ES+XC-40G3CXL Line card inlet sensor is inappropriately positioned in a place where temperatures are higher than on the inlet point. Workaround: There is no workaround. Further Problem Description: There are no problems with the functioning of the board. Only the external communication is affected. "BB Inlet 1" shows the actual inlet temperature. It can be used for reliable measurement of line card inlet temperature. |
ES+: DEV_SELENE XAUI_LEN; FIFO_FULL; XAUI_GNT and XAUI_MIN errors | |||
CSCtq07626 | R | 3 | c7600-es-platform |
15.2(00.01)S 15.1(03)S 15.1(02.16)S0.3 15.1(02)S1.4 15.1(02)S02 15.0(01)S3.8 15.0(01)S04 12.2(33.06.01)SRD 12.2(33.03.17)SRE 12.2(33)SRE04 12.2(33)SRD07 | |||
Symptom: Errors detected by selene ASIC: %DEV_SELENE-DFC1-3-XAUI_LEN %DEV_SELENE-DFC1-3-FIFO_FULL %DEV_SELENE-DFC1-3-XAUI_GNT %DEV_SELENE-DFC1-3-XAUI_MIN Conditions: Observed on ES+ linecards of Cisco 7600 Series Routers. Workaround: None. Further Problem Description: Listed error types are not HW failures. Instead of being reported through error messages, occurrence of these errors can be tracked through CLI: remote command module module show platform hardware drops. |
ES+: Watchdog resets fail to write crashinfo; causing Keep Alive failure | |||
CSCtr74953 | R | 3 | c7600-es-platform |
15.2(01)S 15.1(03)S01 15.1(03)S0.6 15.0(01)S4.13 15.0(01)S05 12.2(33.04.11)SRE 12.2(33)SRE05 | |||
Symptom: %OIR-SP-3-PWRCYCLE: Card in module 1, is being power-cycled off (Module not responding to Keep Alive polling) %C7600_PWR-SP-4-DISABLED: power to module in slot 1 set off (Module not responding to Keep Alive polling) There is no crashifo file created. Conditions: Observed on ES+ linecards of Cisco 7600 Series Routers. This bug is specific to a condition where no other explanations exist for the failure of Keep Alive polling. Workaround: There is no workaround. Further Problem Description: This fix does not prevent the line card crash, but it prevents the silent crash. This fix ensures that a crashifo will be written on the ES+ line card flash disk. It also ensures that the line card is reset as soon as the error condition is detected, as opposed to waiting for a Keep Alive failure. |
ES+: PCI read hang causes Keep Alive failure; fails to write crashinfo | |||
CSCts25729 | R | 3 | c7600-es-platform |
15.2(01.02)S 15.2(01)S 15.2(00.18)S0.3 15.1(03)S1.3 15.1(03)S02 12.2(33.04.14)SRE 12.2(33)SRE05 | |||
Symptom: %OIR-SP-3-PWRCYCLE: Card in module 1, is being power-cycled off (Module not responding to Keep Alive polling) %C7600_PWR-SP-4-DISABLED: power to module in slot 1 set off (Module not responding to Keep Alive polling) There is no crashifo file created. Conditions: Observed on ES+ linecards of Cisco 7600 Series Routers. This bug is specific to a condition where no other explanations exist for the failure of Keep Alive polling. Workaround: There is no workaround. Further Problem Description: This fix does not prevent the line card crash, but it prevents the silent crash. This fix ensures that a crashifo and mini crashinfo will be written on the ES+ line card flash disk. It also ensures that the line card is reset as soon as the error condition is detected, as opposed to waiting for a Keep Alive failure. |
ES+: Ingress traffic will not pass in some cases | |||
CSCtt13344 | R | 3 | c7600-es-platform |
15.2(01.11)S 15.2(01)S 15.2(00.18)S0.11 12.2(33.05.06)SRE 12.2(33)SRE06 | |||
Symptom: Packet silently dropped. Conditions: Observed on 7600 Series Router ES+ linecards. There are no specific conditions for this. Symptom is more likely to be observed on very busy NPs with Gigabit interfaces (as opposed to TenGig), with MTU explicitly configured. Workaround: Available to TAC. Further Problem Description: Complete fix available through CSCty22112. |
ES+ not sending multicast traffic on EVC | |||
CSCty51740 | D | 3 | c7600-es-platform |
Symptom: Packet silently dropped. Conditions: Observed on 7600 Series Router ES+ linecards. There are no specific conditions for this. Symptom is more likely to be observed on very busy NPs with Gigabit interfaces (as opposed to TenGig), with MTU explicitly configured. Workaround: Available to TAC. |
ES+: Machine check exception crash (vector 200) (Ref: CSCub39296 also) | |||
CSCua51760 | R | 3 | c7600-es-platform |
15.3(00.15)S 15.1(03)S3.15 15.1(03)S04 | |||
Symptom: Unexpected exception to CPU: vector 200, PC = 0x0 Traceback decode is irrelevant. Conditions: Observed on ES+ series lincards of Cisco 7600 Series Routers. Workaround: There is no workaround. |
ES+: TCAM_MGR_HW_ERR: TCAM device had corrupted data errors | |||
CSCtc17311 | R | 2 | c7600-es-platform |
12.2(33.03.12)SRD 12.2(33.01.06)MCP07 12.2(33)SRE01 12.2(33)SRD04 | |||
Symptoms: TCAM device is reporting corrupted data: %X40G-DFC2-3-TCAM_MGR_HW_ERR: GTM HW ERROR: TCAM device had corrupted data, the error is corrected for channel ... Conditions: Observed on ES+ linecards of Cisco 7600 Series Routers, by a background TCAM consistency checker. Workaround: There is no workaround. Further Problem Description: These messages can safely be ignored as the entries are already corrected. |
ES+: ECC_DOUBLE: Double-bit ECC error detected on NP - High T; Normal V | |||
CSCtd66014 | R | 2 | c7600-es-platform |
12.2(33.03.14)SRD 12.2(33.01.07)MCP07 12.2(33)ZI 12.2(33)SRE01 12.2(33)SRE00a 12.2(33)SRD04 12.2(32.00.11)SRE | |||
Symptoms: ES+ line card crashes at powerup of a Cisco 7600 router that is running Cisco IOS 12.2SRE image if either the Traffic Manager or Frame memories in the ES+ Network processors report a double bit ECC error. The ES+ line card crashinfo will have the following string: %NP_DEV-DFC2-3-ECC_DOUBLE: Double-bit ECC error detected on NP 0, Mem 19, SubMem 0x1,SingleErr 1, DoubleErr 1 Count 1 Total 1 Conditions: Router reloads, OIR of ES+ cards, system environment temperatures that slowly vary around an ambient temperature of about 30 degreesC. This happens at system powerup. We have seen double bit ECC problems reported after a few hours of traffic if the ambient temperatures vary around 30 degreesC. Workaround: No configuration workaround is available. The line card will reset itself and will be operational in the second reload. |
ES+: ECC_SINGLE or ECC_DOUBLE error detected on NP | |||
CSCtd99244 | R | 2 | c7600-es-platform |
12.2(33.03.15)SRD 12.2(33.01.07)MCP07 12.2(33)ZI 12.2(33)SRE01 12.2(33)SRE00a 12.2(33)SRD04 12.2(32.00.11)SRE | |||
Symptoms: 7600 series router with ES+ line card crashes reporting single bit or double bit ECC error. %NP_DEV-DFC2-3-ECC_SINGLE: Single-bit ECC error detected on NP 0, Mem 18, SubMem 0x1,SingleErr 1, DoubleErr 0 Count 1 Total 1 %NP_DEV-DFC2-3-ECC_DOUBLE: Double-bit ECC error detected on NP 0, Mem 19, SubMem 0x1,SingleErr 1, DoubleErr 1 Count 1 Total 1 Conditions: Symptom observed on ES+ linecard of C7600 series routers, usually in the initial phases of line card bootup, but this has also been reported after a few hours of traffic through the ES+ line card ports. Workaround: There is no workaround. Further Problem Description: Software fix is available in : 12.2(33)SRD5 or higher 12.2(33)SRE2 or higher 15.0(1)S or higher If symptom persists after IOS upgrade please contact Cisco TAC. |
ES+: ECC_DOUBLE: Double-bit ECC error detected on NP | |||
CSCtd99248 | R | 2 | c7600-es-platform |
12.2(33.03.15)SRD 12.2(33.01.07)MCP07 12.2(33)ZI 12.2(33)SRE01 12.2(33)SRE00a 12.2(33)SRD04 12.2(32.00.11)SRE | |||
Symptoms: 7600 series routers with ES+ line cards there could be occasional double bit ECC errors for the traffic manager and other metadata memories that are reported on the Network processor on the ES+ line card. Example error message: %NP_DEV-DFC9-3-ECC_DOUBLE: Double-bit ECC error detected on NP 3, Mem 18, SubMem 0x1,SingleErr 1, DoubleErr 1 Count 1 Total 1 Conditions: This symptom is observed when the router reloads, OIR of ES+ cards, system environment temperatures that slowly vary around an ambient temperature of about 30 degreesC. This happens at system power up. The double bit ECC errors reported after a few hours of traffic. Workaround: No configuration workaround is available. The line card will reset itself and will be operational in the second reload. Further Problem Description: Software fix is available in : 12.2(33)SRD5 or higher 12.2(33)SRE2 or higher 15.0(1)S or higher If symptom persists after IOS upgrade please contact Cisco TAC. |
Invalid LinkFPGA or LINKFPGA Bus Error | |||
CSCte14535 | R | 2 | c7600-es-platform |
12.2(33.03.16)SRD 12.2(33.01.07)MCP07 12.2(33)SRE01 12.2(33)SRD04 12.2(32.00.13)SRE | |||
Symptom: Possible symptoms are: %FPD_MGMT-3-INVALID_IMG_VER: Invalid ... LinkFPGA .. image version detected for ... card in slot-dc ... %FPD_MGMT-6-UPGRADE_PASSED: ... LinkFPGA ... image in the ... card in slot-dc 7-2 has been successfully updated from version ?.? to version ... %C7600_ES-2-IOFPGA_IO_BUS_ERROR: C7600-ES Line Card IOFPGA IO LINKFPGA Bus Error Conditions: Observed during boot/reload of ES+ line card in Cisco 7600 Series Routers. Rare in normal working ES+ cards. Workaround: This fix is an enhancement which adds an additional recovery cycle for reading the LinkFPGA. Further Problem Description: The link FPGA should recover in the next recovery reload of the ES+. If the recovery does not happen after 3 consecutive times, then a persistent hardware fault may be the reason. Contact TAC for RMA procedures. |
Low-queue ES+: ECC_DOUBLE: Double-bit ECC error detected on NP; Mem 16 | |||
CSCth15790 | R | 2 | c7600-es-platform |
15.1(00.11)S 15.0(01)S 15.0(00.13)S0.9 12.2(33.04.17)SRD 12.2(33.03.25)SRD 12.2(33)SRE02 12.2(33)SRD05 12.2(32.00.36)SRE | |||
Symptoms: %NP_DEV-DFC9-3-ECC_DOUBLE: Double-bit ECC error detected on NP 3, Mem 16, SubMem 0x1,SingleErr 1, DoubleErr 1 Count 1 Total 1 Conditions: Symptom observed on Low-queue ES+ line cards (ES+T) of C7600 series routers, in NP Mem 16. Workaround: There is no workaround. Further Problem Description: If symptom persists after IOS upgrade please contact Cisco TAC. |
ENV-4-MINORTEMPALARM - updating the new temperature thresholds for ES+ | |||
CSCth25959 | R | 2 | c7600-es-platform |
15.1(00.09)S 15.0(01)S 15.0(00.13)S0.9 12.2(33.04.16)SRD 12.2(33.03.25)SRD 12.2(33)SRE02 12.2(33)SRD05 12.2(32.00.35)SRE | |||
Symptom: Temperature alarm (ENV-4-MINORTEMPALARM) is reported, with AMBER LED on the line card faceplate. Conditions: 7600 series router with any model of the ES+ line card. Workaround: No workaround. Further Problem Description: Temperature thresholds were set too low before this bug-fix . Correct settings are: -------------------------------------------- Sensor Minor Major ID Threshold Threshold -------------------------------------------- BB Outlet 0 65 80 BB Outlet 1 70 85 -------------------------------------------- It is recommended to evaluate also the related bug CSCtn68668. |
Remove show platform hardware config-pld from show tech | |||
CSCti78408 | R | 2 | c7600-es-platform |
15.1(01.04)S 15.1(01)S 15.1(00.18)S0.4 15.0(01)S01 15.0(00.13)S0.24 12.2(33.04.29)SRD 12.2(33.02.05)SRE 12.2(33)SRE03 12.2(33)SRD05 | |||
Symptoms: %SYS-DFC4-3-CPUHOG: Task is running for (128000)msecs, more than (2000)msecs (4/3),process = console_rpc_server_action. %SYS-DFC4-2-WATCHDOG: Process aborted on watchdog timeout, process = console_rpc_server_action. Conditions: This issue could hit in two conditions: 1. Issuing "show tech" command on ES+ 2. Issuing "show platform hardware config-pld" on ES+ Workaround: Should not use "show tech" and "show platform hardware config-pld" on ES+. |
ES+: LONGBUSYREAD: C2W Interface busy for long time reading temp sensor | |||
CSCtr74529 | R | 2 | c7600-es-platform |
15.2(00.15)S 15.1(03)S01 15.1(03)S0.4 15.1(02)S1.10 15.1(02)S02 15.0(01)S4.4 15.0(01)S05 12.2(33.06.01)SRD 12.2(33.04.08)SRE 12.2(33)SRE05 12.2(33)SRD07 | |||
Symptoms: %ENVM-4-LONGBUSYREAD: C2W Interface busy for long time reading temperature sensor Conditions: Observed on ES+ linecard of Cisco 7600 Series Routers. Workaround: There is no workaround. |
Crash on ES+ on issuing show tech or show platform hardware version | |||
CSCtz30983 | R | 2 | c7600-es-platform |
15.3(00.01)S 15.2(02)S1.6 15.2(02)S02 15.1(03)S3.14 15.1(03)S04 15.0(01)S06 12.2(33.05.31)SRE 12.2(33)SRE07 | |||
Symptoms: Crash on ES+ line card upon issuing the "show hw-module slot X tech- support" or "show platform hardware version" command. This is similar to CSCti78408 but not to CSCti78408. Conditions: This symptom occurs on an ES+ line card. Workaround: Do not issue the "show hw-module slot X tech-support" or "show platform hardware version" command on an ES line card unless explicitly mentioned by Cisco. |
Link FPGA Update Failures with Different signatures | |||
CSCth20868 | C | 2 | c7600-esm-20 |
Symptom: ES+ card crashes with different failure messages during production. In Most of the cases the initial message for reload will be FPD upgrade failure for multiple attempts. The crash messages in this case will be different at different bootup attempts. These messages can be System Exception, FPD upgrade failure, IOFPGA bus error. Message Examples are Initial symptom would be: %FPD_MGMT-3-INVALID_IMG_VER: Invalid 20x1G LinkFPGA (FPD ID=7) image version detected for 7600-ES+20G card in slot-dc 7-2. IOFPGA bus error symptom: %C7600_ES-DFC7-2-IOFPGA_IO_BUS_ERROR: C7600-ES Line Card IOFPGA IO LINKFPGA Bus Error: and other system Exceptions. Conditions: Symptom observed during boot-up of 7600-ES+ linecards. Workaround: None. |
ES+ Bridge ASIC get locked up and stops forwarding | |||
CSCtz51545 | C | 2 | c7600-mpls |
Symptom: ES+ stops forwarding traffic Error messages like below will be seen on hitting this problem: *Mar 29 20:33:55.283: %FABRIC-SP-6-TIMEOUT_ERR: Fabric in slot 5 reported timeout error for channel 17 (Module 9, fabric connection 1)no *Mar 29 20:33:55.507: %FABRIC_INTF_ASIC-DFC9-5-FABRICSYNC_DONE: Fabric ASIC 1 Channel 1: Fabric sync do Conditions: 1) VRF configurations in conjunction with any of the below: (And "mls mpls recir-agg" not configured) i) core paths are enabled with MPLS TE-FRR (or) BGP PIC (or) IP-FRR ii) Presence of Distributed ether channel with SVI combination towards VRF access. 2) Presence of GREoMPLS cases with any of the below: (And mls mpls tunnel recir is not configured) i) core paths are enabled with MPLS TE-FRR (or) BGP PIC (or) IP-FRR Issue is not seen always, but seen transiently sometimes with the above configurations due to some transient conditions. Workaround: If it's condition 1, enable "mls mpls recir-agg", If it's condition 2, enable "mls mpls tunnel-recir" If both of the above does not help solve traffic blackholing, auto recovery can be enabled using the cli "hw-module slot mp-recovery-enable". The above cli will be available from 15.2(02.18)S release. |
ES+ ROMMON: MPC8548 DDR20 errata fix for Multi-bit ECC errors | |||
CSCtb76621 | R | 3 | c7600-system |
12.2(33r)SRD07 12.2(32.08.27)REC186 | |||
Symptom: %C6K_MEM_ECC-DFCx-2-MBE: Multiple bit error detected at ... %C6K_MEM_ECC-DFCx-3-SYNDROME_MBE: 8-bit Syndrome for the detected Multi-bit error: ... %C7600_MEM_ECC-DFCx-2-MBE: Multiple bit error detected at ... %C7600_MEM_ECC-DFCx-3-SYNDROME_MBE: 8-bit Syndrome for the detected Multi-bit error: ... Conditions: Observed on ES+ line card of Cisco 7600 Series Router. Workaround: There is no workaround. Further Problem Description: This fix is integrated in the 12.2(33r)SRD7 ROMMON image for ES+ card. SRD7 rommon image is bundled into IOS package for Cisco 7600 Series Router starting from 15.0(1)S. Cisco 7600 Series Routers running an image from 12.2(33)SRD or 12.2(33)SRE version may also run SRD7 rommon. If affected by this issue, contact Cisco TAC and request the 12.2(33r)SRD7 image. Please refer this link for the rommon upgrade procedure: http://www.cisco.com/en/US/docs/routers/7600/rommon/rsp720_rommon.html#wp180816 |
ES+ Metropolis lock-up: Recovery fix. | |||
CSCty93833 | R | 3 | c7600-system |
15.2(02.18)S 15.2(02)S1.9 15.2(02)S02 15.1(03)S3.14 15.1(03)S04 12.2(33.06.09)SRE 12.2(33)SRE07 | |||
Symptom: Metropolis on ES+ line card gets stuck (stops processing packets) leading to traffic loss and in turn service impact. Conditions: Problem to surface, all of the following conditions should be present. 1.Re-write instruction from Earl *must* involve removal as well as addition of a few bytes. 2.The instruction to remove bytes *must* occur as a result of look-up in the VPN CAM (an internal block within Earl) 3.The packet size *must* be such that the re-write causes the packet to spill-over into the next line (thus over-writing start of next packet) 4.The re-write instruction *must* also force the packet to be re-circulated. All four conditions above *must* be satisfied for the lock-up to occur. Workaround: 1. Eliminating any of the above conditions will solve the problem. 2. Apply earl-patch to recover. |
ES+ : Distinguish Earl Inlet and outlet temp reading during error case | |||
CSCua78310 | D | 3 | c7600-system |
Symptom: EARL Inlet and Outlet Sensor on ES+ card report the same temperature reading. Conditions: None. Workaround: None. |
[obselete]ES+ : Incorrect temperature Displayed in EARL INLET/OUTLET | |||
CSCua78386 | R | 3 | c7600-system |
15.3(01)S 15.3(00.18)S | |||
Symptom: Incorrect temperature Displayed in EARL INLET/OUTLET Temperature displayed may stuck at the erroneous value . Conditions: Occurs on ES+ LC. Seems to be due to some Spurious write happening in the Sensor device register which may cause the issue. Workaround: None. |
ES+ : Incorrect temperature Displayed in EARL INLET/OUTLET | |||
CSCub74451 | R | 3 | c7600-system |
15.3(01.01)S 15.2(04)S01 15.2(04)S0.5 12.2(33.07.02)SRE 12.2(33)SRE07a | |||
Symptoms: EARL inlet/outlet displays incorrect temperature values. If the temperature crosses the minor/major threshold false alarms will be generated. In case of a major alarm the linecard will shut down as a preventive measure. Conditions: There is no trigger for the issue. Workaround: Reload the linecard. |
Initialize TM external frame and control memories in ES+ | |||
CSCtq85884 | R | 3 | c7600-system |
15.2(02.09)S 15.2(02)SNG 15.2(02)S01 15.2(02)S0.1 15.2(01)S02 15.1(03)S2.18 15.1(03)S03 15.1(03)MR 15.0(01)S5.10 15.0(01)S06 12.2(33.05.26)SRE 12.2(33)SRE07 | |||
Symptom: ECC double bit error followed with a line card crash. Sample symptom: %NP_DEV-DFC5-3-ECC_DOUBLE: Double-bit ECC error detected on NP 1, Mem 17, SubMem 0x1,SingleErr 1, DoubleErr 1 Count 1 Total 1 Condition: Reported by ES+ line card on Cisco 7600 Series Routers, during the boot of the linecard. Workaround: None. Further Problem Description: This fix is introducing an additional memory check during line card boot. As a consequence, it may expose during line card boot some previously undetected failures. These failures would otherwise restart the line card during normal operation. |
ES+: silent packet drops | |||
CSCty22112 | R | 3 | c7600-system |
15.2(02.14)S 15.2(02)S01 15.2(02)S0.1 15.2(01)S1.6 15.2(01)S02 15.1(03)S2.16 15.1(03)S03 15.0(01)S5.14 15.0(01)S06 12.2(33.05.25)SRE 12.2(33)SRE06 | |||
Symptom: Packet silently dropped. Conditions: Observed on 7600 Series Router ES+ linecards. There are no specific conditions for this. Symptom is more likely to be observed on very busy NPs with Gigabit interfaces (as opposed to TenGig), with MTU explicitly configured. Workaround: Please refer to Note to TAC enclosure for the workaround |
ES+: FABRICCRCERRS after SSO due to Metropolis lockup | |||
CSCto55567 | R | 2 | c7600-system |
15.2(00.06)S 15.1(03)S 15.1(02.16)S0.4 15.0(01)S3.12 15.0(01)S04 12.2(33.04.05)SRE 12.2(33)SRE05 | |||
Symptoms: line card reports fabric errors: %FABRIC_INTF_ASIC-DFC9-4-FABRICCRCERRS: Fabric ASIC 0: 322 Fabric CRC error events in 100ms period Also, TestMacNotification and TestFabricCh0Health diagnostic tests are failing. Conditions: Symptom is observed on ES+ line cards of C7600 Series Routers after SSO with multicast traffic flowing through the line card. Workaround: Soft reload the line card using the hw-module module module reset exec command. |
ES+: single occurrence of DEV_SELENE XAUI_CODE error | |||
CSCtr37182 | R | 2 | c7600-system |
15.2(00.12)S 15.1(03)S 15.1(02.16)S0.13 15.0(01)S3.14 15.0(01)S04 12.2(33.06.01)SRD 12.2(33.04.07)SRE 12.2(33)SRE05 12.2(33)SRD07 | |||
Symptoms: Single occurrence of XAUI_CODE and XAUI_RX_RDY message in the syslog: %DEV_SELENE-DFC1-3-XAUI_CODE: Selene 1 XAUI 1 Coding Error %DEV_SELENE-DFC1-3-XAUI_RX_RDY: Selene 1 XAUI 1 Rx Rdy changed state Conditions: This symptom is observed on ES+ linecards of Cisco 7600 series router. Workaround: There is no workaround. Further Problem Description: Single occurrence of this error can safely be ignored. |
ISSU Standby reset due to MCL failure "hw-module slot1reset-recycle-bu" | |||
CSCtu30649 | R | 2 | c7600-system |
15.2(01.15)S 15.2(01)S 15.2(00.18)S0.13 15.1(03)S1.7 15.1(03)S02 15.0(01)S4.20 15.0(01)S05 12.2(33.05.09)SRE 12.2(33)SRE06 | |||
Symptoms: Standby is reset. Conditions: This issue is seen when the ISSU standby is reset because of MCL failure. Workaround: There is no workaround. |
Enhance the temperature alarm detection logic on ES+ cards | |||
CSCtz43626 | R | 2 | c7600-system |
15.3(00.18)S 15.2(04)S01 15.2(04)S0.5 15.1(03)S4.14 12.2(33.07.02)SRE 12.2(33)SRE07a | |||
Symptoms: Minor or major temperature alarms reported in the syslog: %C7600_ENV-SP-4-MINORTEMPALARM: module 2 aux-1 temperature crossed threshold #1(=60C). It has exceeded normal operating temperature range. %C7600_ENV-SP-4-MINORTEMPALARM: EARL 2/0 outlet temperature crossed threshold #1(=60C). It has exceeded normal operating temperature range. Conditions: The symptom is observed on ES+ series linecards of Cisco 7600 series routers. Specifically, the reported temperature will be far off from reading of other sensors on the linecard. Workaround: There is no workaround. |
ES+: Machine check exception crash (vector 200) | |||
CSCub39296 | R | 2 | c7600-system |
15.3(01.02)S 15.3(01)S0.7 15.3(01)S 15.3(00.20)S0.3 15.2(04)S1.3 15.2(04)S02 15.1(03)S4.12 12.2(33.07.02)SRE 12.2(33)SRE07a | |||
Symptoms: Unexpected exception to CPU: vector 200, PC = 0x0. Traceback decode is irrelevant. Conditions: The symptom is observed on the ES+ series linecards on a Cisco 7600 series router. Symptom is reported on the ES+ console and in the crashinfo file on the ES+ flash disk. It is not reported in the syslog. Workaround: There is no workaround. |
ES+: ECC_DOUBLE: Double-bit ECC error detected on NP; Mem 17 | |||
CSCtn95122 | R | 2 | c7600-system |
15.2(00.09)S 15.1(03)S 15.1(02.16)S0.9 15.0(01)S3.12 15.0(01)S04 12.2(33.04.07)SRE 12.2(33)SRE05 | |||
Symptoms: The ECC double-bit error is reported in syslog, followed with a linecard crash: %NP_DEV-DFC5-3-ECC_DOUBLE: Double-bit ECC error detected on NP ... Mem 17 Conditions: Observed on ES+ linecards of C7600 Series Routers when heavy configuration changes are applied to the linecard. In addition, there are other unknown race conditions that can cause this. This bug-fix is specific to Double-bit errors on Mem 17. Workaround: There is no workaround. |
Hello Aleksandar,
One of the customers is experiencing the issue discribed in CSCsw31515. Is it possible that the root cause is h/w? How can you make the difference between h/w and transient s/w issues?
Thanks, kind regards,
Eduard
Hi Eduard,
I have tried to reply via email back in Nov but seems the email didn't get through.
I suppose you are referring to ECC double-bit errors. There is no direct way to distinguish between a HW and SW issue. As a general recommendation I can only ask you to upgrade to an IOS release which has all the relavent fixes included. If your customer has experienced repeated ECC errors on multiple linecards, please do open a Service Request.
Regards,
Aleksandar
Also related:
CSCud19230
Symptoms: ES+ line card reload occurs with the following error messages:
%PM_SCP-SP-1-LCP_FW_ERR: System resetting module 2 to recover from error: x40g_iofpga_interrupt_handler: LINKFPGA IOFPGA IO Bus Err val: 4214784 Bus Error Add:332 Bus Err data: 0
%OIR-SP-3-PWRCYCLE: Card in module 2, is being power-cycled Off (Module Reset due to exception or user request)
%C7600_PWR-SP-4-DISABLED: power to module in slot 2 set Off (Module Reset due to exception or user request)
Conditions: This symptom is observed with the ES+ line card.
Workaround: There is no workaround.
Fixed in 12.2(33)SRE8 and recent 15.x rebuilds.
Hi Aleksandar,
we had following issue.
C7609 running 12.2.33 SRE7a and multiple ES+ LC's had a failure on 7600-ES20-GE3CXL where
the first 10 port did not forward any traffic. A reload solved the issue.
It looks like that the first 10XGE Port had a problem.
I could not find any onboard logs or errors.
What could be the issue?
Seems like an issue related to one NP. You could have just reset or reseat that particular linecard instead of reloading the chassis.
Anyway, your card is EOL/EOS with End of SW Maintenance dating back to March, 2012. Afaik, those EOL'ed ES (non-plus) cards have at least one unfixed serious software issue (crashes). I would throw them out as soon as possible.
http://www.cisco.com/c/en/us/products/collateral/routers/7600-series-routers/eol_c51_577514.html
Hi Lukas,
we did a reload of the LC, not chassis. We had similar problems on other hardware and a reload is always performed because it helps most of the time and it's much faster then typing show commands.
It would be nice when I could find a bug that is matching this issue, even when it's after March 2012.
Dear Aleksandar,
we are using 76-ES+T-4TG card in slot 9.
9/1 interface facing core router , 9/3 facing customer, we config a 900 entries ACL on interface 9/1 incoming, and protect our customers from attacks,
now we found some problem, we ping directly connected Layer3 neighbor on 7609 is ok.
but we ping customers under 9/3 interface from upper layer 3 core router , it lost packets. we found packet arrived at 9/1 interface and not send out 9/3 interface, we ping 100 packet, only found 91 packets send out 9/3 interface to customers.
ios version is sup-bootdisk:c7600rsp72043-advipservices-mz.122-33.SRD4.bin
thank you
Tom
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: