Large number of queue output drops on 6509E

katerina.dardoufa · ‎04-04-2013

Hello all,

we have two 6509E, as our core switches. Recently I noticed that on some connections I have a high output queue drop rate.

These 4 x 2 interfaces (gigabit) are connected to our blade encolure, consisting of 4 x WS-CBS3120X-S. The utilization of the links is really quite low, when I see the increase of the drops. (~=60Mbps). All the links are fiber (SFP) and the distance between the core switches and the enclosure is about 15-20m.

Can anyone help me figure out what the cause of the drops is? I am not aware of any service degradation on the part of the servers.

No CRCs, collisions etc, on the interfaces, apart from the drops.

The line card is a WS-X6748-SFP, but other interfaces don't seem to be experiencing any problems.

Thank you in advance,

Katerina

InayathUlla Sharieff · ‎04-04-2013

HI Katerina,

I am not quite sure if QOS has been enabled on the device or not.

Second A common cause of this might be traffic from a high bandwidth link being

switched to a lower bandwidth link or traffic from multiple inbound links

being switched to a single outbound link. For example, if a large amount of

bursty traffic comes in on a gigabit interface and is switched out to a

100Mbps interface, this might cause output drops to increment on the 100Mbps

interface. This is because the output queue on that interface is overwhelmed

by the excess traffic due to the speed mismatch between the inbound and

outbound bandwidths.

According to the research If at any instantaneous moment we are attempting

to transmit more traffic out of the egress port that can fit on the wire,

then you will see output drops.

Output drops simply indicate that there was more traffic to be sent to the

wire than nominal capacity of the interface allows and therefore interface

has ran out of tx buffers. If ingress traffic is line rate and then ingress

module pushes 2 labels (adds extra bytes) then egress interface gets

over-subscribed. Other possibility is that there are more ingress interfaces

sending traffic to one egress interface which creates natural speed mismatch

and with bursty traffic again egress interface may run out of tx buffers and

drop frames.

Also you can confirm that the issue has been caused by a burst of traffic

with the "txload counter" that is showing high in utilization (txload

234/255) whenever drops are reported on the interfaces; that is an

indication of buffers usage.

http://www.cisco.com/en/US/products/hw/switches/ps700/products_tech_note0918

6a008015bfd6.shtml#backinfo

Also I want to clarify a few concepts on regards of switches and output

drops.

Output drops represent egress traffic dropped due to momentarily exceeded

wire capacity.

Common causes:

Many links sending traffic to a single link

Microburst exceeding link capacity

Oversubscription of line cards (on modular Switches).

On regards of the link utilization, output drops are not the result of

sending more than 1 GB of traffic on a second. A second is a very long time

and in order to be able to send 1GB/s of data, every single bit should be

accommodated on the medium perfectly and all devices would have to transfer

at a very specific and synchronized moment. Since on real world all devices

transmit whenever they need to, we need to buffer some packets that arrive

at the same time so they can be evenly distributed on the link.

In other words, when on a specific moment within a second more than one bit

needs to use the medium, we have to buffer the extra bits and place them in

position as soon as the medium is available again. If on that specific

moment the amount of bits exceeds the available buffers, an output drop will

be generated. That is why we can have 3% of bandwidth utilization and yet

see output drops. The 30 MBs reported is just an average of what happened

within a second but that does not show if from those 30MBs, 10 MBs arrived

on the first Millisecond or less. This happens when the traffic is bursty

and that can only be measured with sniffer traces that can run real time

graphs.

Interface tx pps/bps may be relatively low or way below maximum interface

capacity. Traffic burst and sustained traffic rate combination can cause

output drops. Besides, bps (Bits per Second) and pps (Packets per Second)

interface counters do not reflect reality due to 30 second sliding window.

Example:

Customer is seeing output drops on a 100Mbs link. The average utilization

Of the link is only 55Mbs

Switch#show int f1/24 | I duplex|output drops|rate

Full-duplex, 100Mb/s, media type is 10/100BaseTX

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 15640

5 minute input rate 5517000 bits/sec, 4269 packets/sec

5 minute output rate 55324000 bits/sec, 6181 packets/sec

Get a wireshark capture of traffic on the link when the issue is occurring.

Tick Interval = 1 second

Units in Bits/Tick

Over 20s, averaging 50Mbs on a 100Mbs link

The customer claims that this link is not being over utilized.

Tick Interval = 0.001 sec

Units in Bits/Tick

100Mb/sec = 100kb/msec

On a millisecond scale, we can see that the link is often bursting over the

line rate. These packets will be dropped and recorded as an output drop on

the egress interface of the switch.

Note:

1Gb/sec = 1Mb/msec

As a summary, output drops are not due to hardware or software failures but

due to traffic. If the traffic causing the drops is not expected then we

need to troubleshoot why those packets are reaching the interface (spanning

tree loop, unicast flooding, routing mis-configuration, etc). If the traffic

is expected then we need to increase the available medium (port-channel) so

the buffers are used less.

HTH

Regards

Inayath

*Plz rate all usefull posts.

katerina.dardoufa · ‎04-08-2013

Hi Inayath and thank you very much for your reply.

Since the tx rate is low I will assume that there is bursty traffic, but that is something that I will have to troubleshoot. I too was looking into maximizing the bandwidth with the use of portchannels, so probably that is the direction I will follow.

The truth is that we do have unicast flooding (due to the network layout), and unfortunately most of the more utilized vlans go out through the respective interfaces. I will also try to limit the vlans that are not needed on the specific blades!

Good to know that it is not a hardware or software problem

Thanks!