2900 Series routers taking interface errors with matching overruns

bret · ‎05-02-2016

I have scoured the forums and see there are folks with the same problem I am having, but there seems to be differing opinions on the cause. I have several 2900 series routers running different versions of code. The code ranges from 15.1(4)M8 to 15.4(3)M1. The router then connects to a Cisco 2960x and is auto/auto on the router interface and switch interface. The WAN connection to the PE is 10Mbps and the LAN traffic rarely hits 5Mbps.

The reason for my investigation was a site is experiencing extreme slowness as they put it. After looking into the problem I found the LAN interface of their 2911 was taking input errors and matching overruns and the errors for today is over 4,500. Since the last clearing 50m ago 1036 input errors, 0 CRC, 0 frame, 1036 overrun, 0 ignored. I have other sites with similar amount of traffic taking no errors, so there is a problem somewhere. Since there are no CRC's in play I have ruled out a bad cable, but plan on moving the LAN interface to G0/2 to see if the errors follow.

Troubleshooting the errors further I found the server at the site was using the router as the default-gateway, so I had it changed to the switches SVI. Unfortunately, this change had no impact and the errors are still occurring. I have started to do a bug check to see if there is anything that a code upgrade could help. Has anyone seen these errors occurring on their 2900's if so, how did you fix it.

All help is appreciated!

Leo Laohoo · ‎05-03-2016

Kindly post the complete output to the command "sh interface <BLAH>".

bret · ‎05-03-2016

Thanks for helping out Leo- see below. Router interface and switch interface.

interface GigabitEthernet0/0
description Connection to TXDAD-CAS1 Gi1/0/48
ip address 192.168.147.2 255.255.255.192
no ip redirects
no ip unreachables
no ip proxy-arp
ip pim sparse-dense-mode
ip flow monitor netflow-mon input
standby 0 ip 192.168.147.1
standby 0 priority 105
standby 0 preempt
standby 0 track 1 decrement 10
ip cgmp
duplex auto
speed auto
service-policy output LAN-QoS
service-policy type performance-monitor input voice-data

GigabitEthernet0/0 is up, line protocol is up
Hardware is CN Gigabit Ethernet, address is 881d.fc23.0ad0 (bia 881d.fc23.0ad0)
Description: Connection to TXDAD-CAS1 Gi1/0/48
Internet address is 192.168.147.2/26
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full Duplex, 1Gbps, media type is RJ45
output flow-control is unsupported, input flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters 16:25:07
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 1047
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 187000 bits/sec, 51 packets/sec
5 minute output rate 111000 bits/sec, 49 packets/sec
     11759356 packets input, 3860528667 bytes, 0 no buffer
     Received 321074 broadcasts (296 IP multicasts)
     0 runts, 0 giants, 0 throttles
     3060 input errors, 0 CRC, 0 frame, 3060 overrun, 0 ignored
     0 watchdog, 314448 multicast, 0 pause input
     11451309 packets output, 3824073170 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

interface GigabitEthernet1/0/48
description TXDAD-CWR1 G0/0
switchport access vlan 900
switchport mode access
srr-queue bandwidth share 1 30 35 5
priority-queue out
mls qos trust dscp
auto qos trust
spanning-tree portfast

GigabitEthernet1/0/48 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is bc16.f562.ce30 (bia bc16.f562.ce30)
Description: TXDAD-CWR1 G0/0
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:13, output 00:00:33, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 344
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 118000 bits/sec, 60 packets/sec
5 minute output rate 183000 bits/sec, 61 packets/sec
     156742566 packets input, 120458239362 bytes, 0 no buffer
     Received 389560 broadcasts (318893 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 318893 multicast, 0 pause input
     0 input packets with dribble condition detected
     159765333 packets output, 116258976137 bytes, 0 underruns
     0 output errors, 0 collisions, 3 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

Leo Laohoo · ‎05-03-2016

Gi 0/0 is of concern because after 16 hours the "input errors" creeps up, however, CRC error is "0". No CRC means it is NOT a speed/duplex mismatch error. Creeping CRC errors means it's a Layer 1 issue with the copper line.

There is an "easy" test called a TDR, however, routers do NOT support TDR. Only some selected Cisco switches support TDR. TDR will help determine if there is a problem with the copper line and WHERE (or how far away from the port) is the problem.

bret · ‎05-04-2016

Leo- the TDR test was the first thing I did and the test came back good, see below. I think its something other than a cable issue. Both ports are gig, so not sure how it could be a bandwidth thing. I thought when I found the server was pointing to the router as its default gateway was the problem, but after making a server change the problem persist. Its gotta be something else and its effecting multiple 2900 routers.

TDR test last run on: May 03 05:59:04

Interface Speed Local pair Pair length        Remote pair Pair status
--------- ----- ---------- ------------------ ----------- --------------------
Gi1/0/48 1000M Pair A     30   +/- 10 meters Pair A      Normal
                Pair B     30   +/- 10 meters Pair B      Normal
                Pair C     30   +/- 10 meters Pair C      Normal
                Pair D     30   +/- 10 meters Pair D      Normal

Leo Laohoo · ‎05-04-2016

Interesting ... I am going to put a hunch on IOS bug but you've already tried two versions.

bret · ‎05-05-2016

I have not tried any different versions, but that is my next step. I was mentioning I have other routers running different versions with the same issue.

Richard Bradfield · ‎05-04-2016

Please see this old thread, perhaps it will help

https://supportforums.cisco.com/document/13796/overruns-counter-show-interfaces-command-output-increasing

Richard

bret · ‎05-04-2016

Thanks for the link Richard. Unfortunately, I don't believe it pertains to this problem. The LAN interface on this router today has taken 309 errors. Using my monitoring tool, I had a spike of 150 errors this morning at 12:15am, when I match that to the bandwidth, the data flow at 300kbps. Since both the switch interface and router interface are gig, I don't believe the capacity of the interface was reached. This is a very puzzling problem. I moved the LAN interface to G0/2 and reloaded the router. Hopefully the errors stop.

casanavep · ‎05-05-2016

I have seen errors like this associate with microburst. They are small, often sub-second, burst of traffic that overrun buffers. The duration of the traffic spike is often very small, thus is never registers in the counters. One thing you may want to do is adjust your interface counters from a 5 min average interval to a 30 second one. This can be done with the "load-interval 30" command under the interface. Looking at a 5 minute average would hide even a 30 second burst from registering, regardless of how fast you are refreshed your view of the "show interface"

Glennmar Milliner · ‎05-06-2016

I think the answers from Richard Bradfield and casanavep are some of the closest ones. You seem to only be having one kind of problem which is displayed as overruns. This in a nutshell is traffic that the routers hardware was not able to manage for "X" reason. In your case since you are connected to a switch, you may be sending more traffic to the router at a line rate that it is not capable of managing. Even if they are both Gig interfaces it does not really mean that their hardware would work the same. Since most switches does not perform extra checks and just forwards traffic, they are able to send and receive a little faster.

Another factor to keep in mind would be that you could be sending more packets but not consuming as much bandwidth. It is a little confusing but if I am sending 100 packets of 1400bytes each, it means that I am sending 140000bytes of data. I could also send the same amount of data in packets sizes of 64bytes BUT the difference is that I will end up sending over 2187 packets. So instead of processing just 100 packets, I end up processing 2187. (Not sure if I am explaining it right for everyone to understand).This is where micro burst could come in. This is normal in networks that manages smaller packets such as voice. It all depends on the packet size that I am sending.

This is a never ending debate that could go on for a long time but you may want to take a look at the following links to see the meaning and cause of overruns and how to see micro bust in a packet capture. THere you will be able to confirm if you have a network with bursty traffic.

NOTE: keep in mind that you are comparing how a Gig interface on a switch works vs how one on a router works. Two different network devices that are designed differently. We would expect to find underlying differences.

Troubleshooting Ethernet
http://www.cisco.com/en/US/docs/internetworking/troubleshooting/guide/tr1904.html

This document is specific to switches but once you get the pcap file from the router or switch, it is all the same...
Wireshark Use to Identify Bursty Traffic on Catalyst Switches
http://www.cisco.com/c/en/us/support/docs/lan-switching/switched-port-analyzer-span/116260-technote-wireshark-00.html

Configuration Example: Embedded Packet Capture on Cisco IOS and IOS XE
https://supportforums.cisco.com/document/139686/configuration-example-embedded-packet-capture-cisco-ios-and-ios-xe

Router IP Traffic Export Packet Capture Enhancements
http://www.cisco.com/c/en/us/td/docs/ios/12_4t/12_4t11/ht_rawip.html

Have fun!!!

bret · ‎05-27-2016

After digging in to my problem further I found there was another device on the LAN using the router as its DFG, instead of the SVI on the switch. I corrected that server and it lowered my overruns. The next step I took was to add flowcontrol receive on to the router uplink interface of the switch. After doing this the overruns stopped. Hopefully this helps the next guy/gal.