05-26-2015 05:43 AM - edited 03-03-2019 07:52 AM
Welcome to this Cisco Support Community Ask the Expert conversation. This is an opportunity to learn and ask questions about Router and IOS Architecture and on Unexpected Reboots on IOS Routers like 7600, 2900, 3900, etc. that you might be facing in your environment which cause a huge impact on your services with Cisco expert Vinit Jain.
Ask questions from Wednesday, May 27th, 2015 to Tuesday, June 9, 2015
Different Routers have different architecture and different capabilities and features using which you can enhance routers performance and get certain tasks done. Reboots on the routers can happen mainly due to 3 factors:
Vinit will be helping you with all your queries on all of the above.
Vinit Jain will also be speaking at Cisco Live in June 2015 on Troubleshooting BGP (BRKRST-3320).
Click here for More Information
Vinit Jain, 3X CCIE #22854 is a Technical Lead in HTTS (High Touch Technical Support) team supporting customers in areas of routing, MPLS, TE, IPv6, multicast and a wide variety of platform issues like High CPU, Memory leak, etc IOS, IOS XE, IOS XR and NxOS code base. Has been delivering trainings within Cisco on various technology as well as platform troubleshooting topics. He has also written workbook on IOS XR fundamentals on Cisco Support Community. Vinit has CCIE in R&S, SP and Sec and holds multiple certifications on programming and databases.
Find other https://supportforums.cisco.com/expert-corner/events.
**Ratings Encourage Participation! **
Please be sure to rate the Answers to Questions
05-26-2015 06:49 AM
Hi Vinit,
I am working on an issue where in we are seeing very high TCAM utilization on multiple nodes - 7609-s.
The device is running an Engg special version.
The only WA is to reload the box which is disruptive.
# is there any non disruptive WA available?
# Is there a perm fix for this issue?
# what could be the root cause for this- config/design issue, hw/sw defect?
Regards,
Avi
05-26-2015 07:01 AM
Hello Avinash
Could you please share the following logs in a file:
- show tcam count - show tcam count detail - show mls cef adj usage - show mls cef hardware - show mls cef summary - show mls cef exception status - show module - show version
Do we know which tcam is heavily utilized? Is there any event that triggers the issue?
Thanks,
Vinit
05-26-2015 08:02 PM
Router1#show module
5 2 Route Switch Processor 720 (Active) RSP720-3CXL-GE
6 2 Route Switch Processor 720 (Hot) RSP720-3CXL-GE
Image
[XXX_v151_3_s1_es-xxx-XXX_v151_3_s1_es 169]
============================================
Router1#show tcam count
Used Free Percent Used Reserved
---- ---- ------------ --------
Labels:(in) 22 4074 0
Labels:(eg) 2 4094 0
ACL_TCAM
--------
Masks: 99 3997 2 72
Entries: 268 32500 0 576
QOS_TCAM
--------
Masks: 7 4089 0 18
Entries: 42 32726 0 144
LOU: 8 120 6
ANDOR: 1 15 6
ORAND: 0 16 0
ADJ: 3 2045 0
Router1#show tcam count detail
Used Free Percent Used Reserved
---- ---- ------------ --------
Labels:(in) 22 4074 0
Labels:(eg) 2 4094 0
ACL_TCAM
--------
HI BANK
Masks: 64 1984 3 72
Entries: 166 16218 1 576
LOW BANK
Masks: 35 2013 1 0
Entries: 102 16282 0 0
QOS_TCAM
--------
HI BANK
Masks: 0 2048 0 18
Entries: 0 16384 0 144
LOW BANK
Masks: 7 2041 0 0
Entries: 42 16342 0 0
LOU: 8 120 6
ANDOR: 1 15 6
ORAND: 0 16 0
ADJ: 3 2045 0
Router1#show mls cef adj usage
Adjacency Table Size: 1048576
ACL region usage: 3
Non-stats region usage: 106051
Stats region usage: 433893
Total adjacency usage: 539947
Router1#show mls cef hardware
CEF TCAM v2:
Size: 1048576 entries
262144 rows/device, 4 device(s)
32 entries/mask-block
32768 total blocks (32b wide)
4849664 s/w table memory
Options:
sanity check: on
sanity interval: 301 seconds
consistency check: on
consistency interval: 11 seconds
redistribution: off
redistribution interval: 120 seconds
redistribution threshold: 10
compression: on
compression interval: 31 seconds
tcam/ssram shadowing: on
Operation Statistics:
Entries inserted: 0000000056771663
Entries deleted: 0000000056278100
Entries compressed: 0000000004028899
Blocks inserted: 0000000000742257
Blocks deleted: 0000000000726725
Blocks compressed: 0000000000317339
Blocks shuffled: 0000000000199869
Blocks deleted for exception: 0000000000000000
Direct h/w modifications: 0000000000000000
Background Task Statistics:
Consistency Check count: 0000000003471970
Consistency Errors: 0000000000000002
SSRAM Consistency Errors: 0000000000000000
Sanity Check count: 0000000000127602
Sanity Check Errors: 0000000000000000
Compression count: 0000000000176966
Exception Handling status : on
L3 Hardware switching status : on
Fatal Error Handling Status : Reset
Fatal Errors: 0000000000000000
Fatal Error Recovery Count: 0000000000000000
SSRAM ECC error summary:
Uncorrectable ecc entries : 0
Correctable ecc entries : 0
Packets dropped : 0
Packets software switched : 0
FIB SSRAM Entry status
----------------------
Key: UC - Uncorrectable error, C - Correctable error
SSRAM banks : Bank0 Bank1
No ECC errors reported in FIB SSRAM.
Router1# show mls cef summary
Total routes: 493566
IPv4 unicast routes: 229382
IPv4 Multicast routes: 381
MPLS routes: 262577
IPv6 unicast routes: 1226
IPv6 multicast routes: 3
EoM routes: 0
Router1#show mls cef exception status
Current IPv4 FIB exception state = FALSE
Current IPv6 FIB exception state = FALSE
Current MPLS FIB exception state = FALSE
====
05-27-2015 08:16 AM
Hello Avinash
looking at the logs, the IPv4 + MPLS TCAM is over 90+ %. Once it reaches 100%, the router will go into exception state. Refer to the below CCO Doc:
http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/117712-problemsolution-cat6500-00.html
This doc also talks about how you can adjust tcam allocation which can be helpful but those changes require reload.
It seems like the increase in the internet table might have been causing the impact on customer's network.
Hope this helps.
Vinit
03-02-2016 05:26 AM
Hi Vinit,
Router1#show mls cef exception status
Current IPv4 FIB exception state = FALSE
Current IPv6 FIB exception state = FALSE
Current MPLS FIB exception state = FALSE
What if all status set to TRUE ?
03-02-2016 07:48 AM
Hello,
This Ask the expert event is closed. Kindly post your question on the
05-27-2015 05:51 PM
Hi Vint jain
My name is HJ Jung from Korean
i got supported ISP company they have lots of WS-C6500, 7600, CRS etc
But that device rebooting issue once a week that's why I got repoted to TAC team but they always answer parity error will be monitoring recommended usually.
However me and our customter can't make sense with monitroing recommedation under parity error issue. so I got some question as a below
1, can you explain what exactly parity error and why occured that kind issue form each platoform
2, Why must be reset or rebooting occured with parity error or other circumstance
3, We can control parity error detect scheudle or other control method via configuration
thanks.
05-27-2015 07:11 PM
Hello Mr. Jung
Those are really a good set of questions which are highly noticed in Cisco TAC. Regarding your questions, below are the answers for each of them:
1. There are two kinds of parity errors (Soft parity errors and Hard parity errors). Soft parity errors are the one's which happens once in a while and can be treated as transient hardware issues. Hard parity errors are actually hardware issues (in the error logs you can see Single-Bit, Double-Bit or Multi-Bit parity error) in which case you should replace your hardware. There is a good CCO documentation on Parity Errors shared below:
http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/116135-trouble-6500-parity-00.html
2. If you are getting parity errors in your environment very frequently, then you must consider checking the use of ESD in your data center by the engineers. Also, parity errors can be caused due to environmental factors. The above document has a good mention of ESD usage.
Please note if a parity error is occurring frequently on a single device, you need to replace it. It is advisable to replace the hardware if the device faces crash due a parity error more than once in 6 months. If its just once, we can treat it as soft parity error and monitor for another occurrence. If its more than once, we treat it as hard parity error and its advisable to replace the card which faced the parity error.
3. There is no configuration to control or detect parity error. The best way to control it is to increase the usage of ESD and make sure the environment is good enough for all those devices.
Along with the above stated, I would like to see one of your logs in which you have faced parity errors (either from 6500 or 7600 series routers).
Hope this information was helpful.
Thanks,
Vinit
PS: Please do rate the post if you find the information useful.
05-27-2015 11:09 PM
Hi Vint
thanks for the answer, but I have some question about ESD progress .
do you have any guide line how to ESD check it up on the Data centre,
If you have any ESD check list or progress documentation, please let me know it
thanks
05-27-2015 11:23 PM
Hello Hongju
i was able to find a Cisco training on ESD:
http://www.cisco.com/web/learning/le31/esd/WhatIsP.html
You can click on next on learning more about ESD.
There is another CCO documentation which talks a bit about ESD:
http://www.cisco.com/web/tsweb/pdf/Guidelines-and-Best-Practices.pdf
Hope this helps
Vinit
09-02-2015 02:17 PM
Thank you for all your responses.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide