Cisco ACE modules sits inside Cisco Catalyst 6500 Series Switches and Cisco 7600 Series Routers to provide high level of load-balancing and application-delivery. ACE modules have robust software and hardware that makes it possible to handle high volume of traffic at real time.
ACE module crashed unexpectedly. No error message or problem log is found. There is no specific reload reason. The module was working fine since long time and then abruptly crashed and reloaded. The module is working fine since then and showing no problem.
Following log is seen on the backup module:
Aug 27 11:29:24 PDT: %FABRIC-SP-6-TIMEOUT_ERR: Fabric in slot 5 detected excessive flow-control on channel 8 (Module 9, fabric connection 0)
Aug 27 11:34:26 PDT: %OIR-SP-3-PWRCYCLE: Card in module 9, is being power-cycled off (Reset - Module Reloaded During Download)
Aug 27 11:34:27 PDT: %C6KPWR-SP-4-DISABLED: power to module in slot 9 set off (Reset - Module Reloaded During Download)
Aug 27 11:34:39 PDT: %OIR-SP-3-PWRCYCLE: Card in module 9, is being power-cycled off (Module not responding to Keep Alive polling)
Aug 27 11:34:39 PDT: %C6KPWR-SP-4-DISABLED: power to module in slot 9 set off (Module not responding to Keep Alive polling)
Aug 27 11:39:34 PDT: %DIAG-SP-6-RUN_MINIMUM: Module 9: Running Minimal Diagnostics...
Aug 27 11:39:34 PDT: %DIAG-SP-6-DIAG_OK: Module 9: Passed Online Diagnostics
Any kind of event that causes module to reload gets logged and is mentioned as reload reason. However certain situations can render even most basic working impossible and thus causing the module to reload without showing any reason. Almost always this is related to SRAM parity error. The SRAM parity error, which can be seen in the core file, is not due to a software issue; although there were software related issues in earlier code versions.
SRAM's are very sensitive to light, dust, radiation, shock, temperature, so it is possible to get an SRAM parity error on an healthy ACE. The issue is the result of a "bit-flip" within the SRAM itself which can occur as a result of environmental conditions. This "bit-flip" is rectified by a simple reboot of the system, which would occur with the generation of the core file. ACE is susceptible to this because of the way it uses SRAM to store control information and packet data as opposed to scratch-pad storage. Almost any 1-bit flip will be detected as a parity error. This is the problem with SRAM memory. All equipment makers face the same issue with this type of memory. SRAM memory is quite sensitive to a lot of things and it may detect parity error once in a while.
Recommended action is to upgrade to A2(3.3) or above in order to fix all software-related SRAM bugs. Refer bug CSCtc53046 for a partial software workaround which mitigates hardware generated SRAM parity errors by reducing the amount of access to the SRAM due to the collection of the interface statistics. SRAM errors are expected to occur at a frequency of approximately one per year per ACE module. A single SRAM parity error does not justify an RMA. If a particular module experiences a significantly higher failure rate and is running A2(3.3) or later, then a proactive RMA would be required.
1. According to bug#CSCvs40360 and release notes, v4.2(6d) should support AES-256 for SNMPv3 encryption. But when I try to add SNMP User profile (screenshot), I still only have AES-128 or DES or None... OR maybe I need to specifically enabling it somewher...
We have started using ManageEngine Opmanager for our monitoring after the Solarwinds breach. The are working on a device template for the Apic but they have not provided one at this time. We have added one to the system and have received the fo...
Hi, I'm unable to ping some leafs and spines from the APIC, though I can ping these leafs and spines from outside the ACI network and I can ping these spines and leafs if I use the source address from any APIC i.e:_ping "leaf" -I "source_APIC_IP"......