cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4623
Views
25
Helpful
9
Replies

Overrun error issue

nasef2010
Level 1
Level 1

Hello All ,

I have in my company a Cisco router  (ASR1006) which is acting as an edge service provider router. Recently i faced a tedious issue that a port-channel interface counting overrun errors.There are three 10G interfaces member on that port-channel and only one of them is counting overrun errors. I have tried to steer the traffic of the mentioned interface and excluded it from the port-channel. The result is that another 10G which was clear of errors now counting errors. Adding to that there are two interfaces which operating ISIS and locally connected to another Cisco router are facing heavy ISIS flapping and also counting overrun errors.

Router total traffic ( 32.46 G / 33.16 G ) (in/out)

Any help or suggestions how to know the root cause of this issue ? i will share any requested logs.

 

 

 

 

 

9 Replies 9

Leo Laohoo
Hall of Fame
Hall of Fame
What is the RP and ESP model?
Interface overruns means the downstream client is sending too much data to the router and the router doesn't have enough "time" to process the data/traffic and the buffer gets "overrun".

Here you are.
#show platform

Chassis type: ASR1006

Slot Type State Insert time (ago)
--------- ------------------- --------------------- -----------------
R0 ASR1000-RP2 ok, standby 2w0d
R1 ASR1000-RP2 ok, active 2w0d
F0 ASR1000-ESP40 ok, active 2w0d
F1 ASR1000-ESP40 ok, standby 2w0d



Ok, so ESP40.
Post the output to the command "sh interface <PORT>" minus the interface description and the IP address.
Let us see how bad this is.

Thanks for your reply

#show interfaces Te2/1/0
Load for five secs: 46%/15%; one minute: 49%; five minutes: 50%
Time source is NTP, 14:28:09.744 CLT Wed Apr 8 2020

TenGigabitEthernet2/1/0 is up, line protocol is up
Hardware is SPA-1X10GE-L-V2, address is c84c.75af.ff90 (bia c84c.75af.ff90)
MTU 9100 bytes, BW 10000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 168/255, rxload 38/255
Encapsulation 802.1Q Virtual LAN, Vlan ID 1., loopback not set
Keepalive not supported
Full Duplex, 10000Mbps, link type is force-up, media type is 10GBase-SR/SW
output flow-control is on, input flow-control is on
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:04, output hang never
Last clearing of "show interface" counters 1w1d
Input queue: 0/375/0/0 (size/max/drops/flushes); Total output drops: 24235
Queueing strategy: fifo
Output queue: 0/40 (size/max)
30 second input rate 1497695000 bits/sec, 532723 packets/sec
30 second output rate 6595753000 bits/sec, 766913 packets/sec
504750695701 packets input, 324905979760552 bytes, 0 no buffer
Received 15127 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
1508980419 input errors, 0 CRC, 0 frame, 1508980419 overrun, 0 ignored
0 watchdog, 2099044 multicast, 0 pause input
594255750974 packets output, 606829395384527 bytes, 0 underruns
0 output errors, 0 collisions, 1 interface resets
26450 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

Yesterday log :-
#show interface TenGigabitEthernet2/1/0 | i err
974143960 input errors, 0 CRC, 0 frame, 974143960 overrun, 0 ignored
548023627429 packets output, 557928747074180 bytes, 0 underruns
0 output errors, 0 collisions, 1 interface resets
0 babbles, 0 late collision, 0 deferred


Another port-channel counting errors:

#show interfaces Port-channel10
Load for five secs: 46%/15%; one minute: 47%; five minutes: 48%
Time source is NTP, 14:34:37.024 CLT Wed Apr 8 2020

Port-channel10 is up, line protocol is up
Hardware is 10GEChannel, address is c84c.75b0.00c9 (bia c84c.75b0.00c9)
Internet address is 172.17.15.113/29
MTU 9100 bytes, BW 20000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 30/255, rxload 137/255
Encapsulation 802.1Q Virtual LAN, Vlan ID 1., loopback not set
Keepalive set (10 sec)
ARP type: ARPA, ARP Timeout 04:00:00
No. of active members in this channel: 2
Member 0 : TenGigabitEthernet2/3/0 , Full-duplex, 10000Mb/s
Member 1 : TenGigabitEthernet0/3/0 , Full-duplex, 10000Mb/s
No. of PF_JUMBO supported members in this channel : 2
Last input 00:00:00, output never, output hang never
Last clearing of "show interface" counters never
Input queue: 0/750/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/80 (size/max)
30 second input rate 10776741000 bits/sec, 1096514 packets/sec
30 second output rate 2387503000 bits/sec, 678411 packets/sec
1178967586097 packets input, 1436122083560409 bytes, 0 no buffer
Received 10499166 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
2187850351 input errors, 0 CRC, 0 frame, 2187850351 overrun, 0 ignored
0 watchdog, 8495925 multicast, 0 pause input
733526320204 packets output, 343272894663003 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

Seems to me that the ESP40 cant take that much traffic. We ran into the same problem. 20gig in and 20 gig out. More and you get those overruns. We upgraded to ASR1009X with ESP200X. That solved the problem.

 

Use the command 

Show Platform Hardware QFP Active Datapath Utilization Summary

Add up In load and Out load. If it counts more than 40gig its the ESP that needs an upgrade. If i remember correctly you can replace it with a ESP100 which fits into a ASR1006 RP2 Setup without having to replace the whole box. 

 https://www.cisco.com/c/en/us/support/docs/routers/asr-1000-series-aggregation-services-routers/200674-Throughput-issues-on-ASR1000-Series-rout.html#anc14 

2187850351 input errors, 0 CRC, 0 frame, 2187850351 overrun, 0 ignored

(I am sorry.  I just saw a reminder for this thread.)

This is a layer 1 issue.  Just look at the size of those line errors.

Hello,

 

on a side note, I think the ASR supports aggregate EtherChannel QoS, so you might want to try something like below (shape to 5Gig in the example, change that accordingly):

 

platform qos port-channel-aggregate 10
!
interface port-channel 10
service-policy output SHAPE_PC
!
policy-map SHAPE_PC
class class-default
shape average 5000000000

 We are observing overrun even after moving the affected interface out of the port channel, and its expected as the issue is with port channel , my recommendation is try using the Class of service 

Please do not hesitate to click the STAR button if you are satisfied with my answer.

At StatSeeker we have customers who are getting overrun errors on high end Cisco routers. We just released a new SNMP monitoring capability to collect this extended MIB data for a number of metrics including overrun errors. I am curious if this could be related to microbursts in the data center. You can run StatSeeker network monitoring for your router for a 45 free trial to monitor these overrun errors, set thresholds and report on your your packet loss issues which could affect delay and jitter. 

Review Cisco Networking for a $25 gift card