Output drops on serial interfaces

wilson_1234_2 · ‎09-03-2009

We have two DS3 connections via BGP to our DR site.

BGP is configured to allow the two paths, and the host to host load is being distributed pretty evenly across the two DS3s.

We are using these links to sync our data from HQ to DR and the data flow is bursty from about 15-20% utilization on both links to 100%.

There are gigabit ports from the source devices, up to two gigabit ports uplinking to the HQ7206 to DR with a single FastE port linking from the 7206 to the destination device.

The traffic is 99% of the time from HQ to DR.

I am seeing output drops accumulate on both serial links on the source side even when the utilization is low.

It seems to be bursty, incrementing for a few seconds, then not for a couple of minutes, then repeat.

There are no drops on the ethernet links on either side.

There is a QoS policy outbound on the serial interfaces with the default class configured to "fair-queue"

I can understand drops during high bandwidth utilization, but not during low utilization. Is this something I should be concerened about?

It seems to be a small amount, could it be just nominal circuit issues?

#sh int summary

*: interface is up

IHQ: pkts in input hold queue IQD: pkts dropped from input queue

OHQ: pkts in output hold queue OQD: pkts dropped from output queue

RXBS: rx rate (bits/sec) RXPS: rx rate (pkts/sec)

TXBS: tx rate (bits/sec) TXPS: tx rate (pkts/sec)

TRTL: throttle count

Interface IHQ IQD OHQ OQD RXBS RXPS TXBS TXPS TRTL

------------------------------------------------------------------------

* GigabitEthernet0/1 0 0 0 0 9847000 2216 1517000 1178 0

* GigabitEthernet0/2 0 0 0 0 7000 1 0 0 0

* Serial1/0 0 0 0 168240 509000 455 6050000 2096 0

* Serial2/0 0 0 0 686957 1659000 1491 4572000 916 0

Serial1/0 is up, line protocol is up

Hardware is M2T-T3+ pa

Description: connected to MCI DS3 Disaster Recovery

MTU 4470 bytes, BW 44210 Kbit, DLY 200 usec,

reliability 255/255, txload 36/255, rxload 2/255

Encapsulation FRAME-RELAY IETF, crc 16, loopback not set

Keepalive set (10 sec)

Restart-Delay is 0 secs

LMI enq sent 0, LMI stat recvd 0, LMI upd recvd 0

LMI enq recvd 15917, LMI stat sent 15917, LMI upd sent 0, DCE LMI up

LMI DLCI 0 LMI type is ANSI Annex D frame relay DCE

FR SVC disabled, LAPF state down

Broadcast queue 0/256, broadcasts sent/dropped 82240/0, interface broadcasts 187718

Last input 00:00:00, output 00:00:00, output hang never

Last clearing of "show interface" counters 1d20h

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 173244

Queueing strategy: Class-based queueing

Output queue: 0/1000/64/173244 (size/max total/threshold/drops)

Conversations 0/45/256 (active/max active/max total)

Reserved Conversations 0/0 (allocated/max allocated)

Available Bandwidth 24157 kilobits/sec

30 second input rate 455000 bits/sec, 413 packets/sec

30 second output rate 6250000 bits/sec, 2253 packets/sec

147250390 packets input, 17700821862 bytes, 0 no buffer

Received 5646 broadcasts, 0 runts, 0 giants, 0 throttles

0 parity

178 input errors, 153 CRC, 0 frame, 24 overrun, 0 ignored, 1 abort

309803920 packets output, 112379404547 bytes, 0 underruns

0 output errors, 0 applique, 0 interface resets

0 unknown protocol drops

0 output buffer failures, 0 output buffers swapped out

0 carrier transitions

rxLOS inactive, rxLOF inactive, rxAIS inactive

txAIS inactive, rxRAI inactive, txRAI inactive

John Blakley · ‎09-03-2009

Looking at your interface statistics, you have CRC errors on the link:

153 CRC

It looks like you cleared your counters out almost two days ago. Do you have this same thing on the other side? You shouldn't have any, so I would get in touch with the provider to see if they are seeing errors on their end as well.

HTH,

John

HTH, John *** Please rate all useful posts ***

wilson_1234_2 · ‎09-03-2009

Thanks for the reply.

I was thinking that given the amount of traffic that has gone through that link, that 153 CRC errors was pretty much nothing.

And there are many more drops that CRC in comparison.

You think it could be an issue?

John Blakley · ‎09-03-2009

153 isn't a lot, but you should have 0. I don't know if that's causing your problem, but it definitely could go hand-in-hand. I had this problem with a DS3 once where I had to prove that it wasn't my problem with AT&T. I was getting 1 CRC every 5 minutes. We eventually moved loops throughout the network, and the farther they got from me, they found a switch that had a bad card that they had to move us from. That resolved the issue. I also wasn't able to get faster than 15Mbps on a 40Mbps line.

I would talk to the provider and have them throw a loop up to see if they can find in their path where the problem is in relation to where you are.

P.S. Having them do a loop will kill your connection while they're testing, so if this is production I would have them do after-hours.

HTH,

John

HTH, John *** Please rate all useful posts ***

Joseph W. Doherty · ‎09-03-2009

"I can understand drops during high bandwidth utilization, but not during low utilization. Is this something I should be concerened about?"

Perhaps; it's not so much a question of drops but whether there are too many drops. Type of traffic is an important consideration (TCP vs. non-TCP).

Low vs. high utilization, with regard to packet dropping, can be misleading. Utilization is based on average usage over some time period. Much can happen within milliseconds. If traffic is "bursty", as you note, it's possible the drops are due to transient congestion.

For many TCP implementations, too many drops, "average" utilization can actually decrease "hiding" a bandwidth oversubscription issue.

In other words, your expectations about drops being tied to low vs. high utilization don't always hold true.

"It seems to be a small amount, could it be just nominal circuit issues?"

Unlikely to be a "circuit issue", beyond transient congestion.

Packet dropping within TCP flows, on any network segment that can be oversubscribed, is a normal part of TCP bandwidth probing. (In fact, in theory it should be much more common then it is in practice, but also in practice, many host TCP implementations don't provide a large enough default RWIN, which often precludes the offered transfer rate from exceeding the available bandwidth.)

Depending on why you're seeing drops, it might be almost impossible to mitigate them on most Cisco devices, or you might be able to decrease them such there are almost none to none at all. For the latter, correct setting of queue depths and/or RED usage might impact your drops.

For your posted stats, the overall drop rate appears low enough it might not be worth the time, or much time, to try to decrease further. However, since you're already using CBWFQ FQ(?), for a T3, you might want to increase the queue depth from the default(?) of 64. You might allow enough packets to hold half to full BDP.

wilson_1234_2 · ‎09-03-2009

Thanks Joseph,

How do I change the threshold?

"hold-queue" at the interface level changed the max and not the threshold.

Joseph W. Doherty · ‎09-04-2009

See if your platform/IOS supports "queue-limit" under class-default, and if so, whether the FQ is modified by it.

wilson_1234_2 · ‎09-04-2009

You are like an encyclopedia joseph.

sean.hoffman · ‎09-02-2010

Hey there, ran across this in a google search. I was seeing incrementing drops on a FastE 0/0 interface on a 2811 router, with only 137k/sec being output - obviously not a usage problem. Tracked it down to a policy map that permitted 128k/sec from a certain server to prevent update traffic from clogging our WAN links - the exceeded traffic that was dropped directly corrolated to the number of output drops we were seeing. So if you see output drops during low usage, check to see if you have a policy-map applied.