04-04-2013 04:09 AM - edited 03-07-2019 12:38 PM
Hello all,
we have two 6509E, as our core switches. Recently I noticed that on some connections I have a high output queue drop rate.
These 4 x 2 interfaces (gigabit) are connected to our blade encolure, consisting of 4 x WS-CBS3120X-S. The utilization of the links is really quite low, when I see the increase of the drops. (~=60Mbps). All the links are fiber (SFP) and the distance between the core switches and the enclosure is about 15-20m.
Can anyone help me figure out what the cause of the drops is? I am not aware of any service degradation on the part of the servers.
No CRCs, collisions etc, on the interfaces, apart from the drops.
The line card is a WS-X6748-SFP, but other interfaces don't seem to be experiencing any problems.
Thank you in advance,
Katerina
04-04-2013 06:44 PM
HI Katerina,
I am not quite sure if QOS has been enabled on the device or not.
Second A common cause of this might be traffic from a high bandwidth link being
switched to a lower bandwidth link or traffic from multiple inbound links
being switched to a single outbound link. For example, if a large amount of
bursty traffic comes in on a gigabit interface and is switched out to a
100Mbps interface, this might cause output drops to increment on the 100Mbps
interface. This is because the output queue on that interface is overwhelmed
by the excess traffic due to the speed mismatch between the inbound and
outbound bandwidths.
According to the research If at any instantaneous moment we are attempting
to transmit more traffic out of the egress port that can fit on the wire,
then you will see output drops.
Output drops simply indicate that there was more traffic to be sent to the
wire than nominal capacity of the interface allows and therefore interface
has ran out of tx buffers. If ingress traffic is line rate and then ingress
module pushes 2 labels (adds extra bytes) then egress interface gets
over-subscribed. Other possibility is that there are more ingress interfaces
sending traffic to one egress interface which creates natural speed mismatch
and with bursty traffic again egress interface may run out of tx buffers and
drop frames.
Also you can confirm that the issue has been caused by a burst of traffic
with the "txload counter" that is showing high in utilization (txload
234/255) whenever drops are reported on the interfaces; that is an
indication of buffers usage.
http://www.cisco.com/en/US/products/hw/switches/ps700/products_tech_note0918
6a008015bfd6.shtml#backinfo
Also I want to clarify a few concepts on regards of switches and output
drops.
Output drops represent egress traffic dropped due to momentarily exceeded
wire capacity.
Common causes:
Many links sending traffic to a single link
Microburst exceeding link capacity
Oversubscription of line cards (on modular Switches).
On regards of the link utilization, output drops are not the result of
sending more than 1 GB of traffic on a second. A second is a very long time
and in order to be able to send 1GB/s of data, every single bit should be
accommodated on the medium perfectly and all devices would have to transfer
at a very specific and synchronized moment. Since on real world all devices
transmit whenever they need to, we need to buffer some packets that arrive
at the same time so they can be evenly distributed on the link.
In other words, when on a specific moment within a second more than one bit
needs to use the medium, we have to buffer the extra bits and place them in
position as soon as the medium is available again. If on that specific
moment the amount of bits exceeds the available buffers, an output drop will
be generated. That is why we can have 3% of bandwidth utilization and yet
see output drops. The 30 MBs reported is just an average of what happened
within a second but that does not show if from those 30MBs, 10 MBs arrived
on the first Millisecond or less. This happens when the traffic is bursty
and that can only be measured with sniffer traces that can run real time
graphs.
Interface tx pps/bps may be relatively low or way below maximum interface
capacity. Traffic burst and sustained traffic rate combination can cause
output drops. Besides, bps (Bits per Second) and pps (Packets per Second)
interface counters do not reflect reality due to 30 second sliding window.
Example:
Customer is seeing output drops on a 100Mbs link. The average utilization
Of the link is only 55Mbs
Switch#show int f1/24 | I duplex|output drops|rate
Full-duplex, 100Mb/s, media type is 10/100BaseTX
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 15640
5 minute input rate 5517000 bits/sec, 4269 packets/sec
5 minute output rate 55324000 bits/sec, 6181 packets/sec
Get a wireshark capture of traffic on the link when the issue is occurring.
Tick Interval = 1 second
Units in Bits/Tick
Over 20s, averaging 50Mbs on a 100Mbs link
The customer claims that this link is not being over utilized.
Tick Interval = 0.001 sec
Units in Bits/Tick
100Mb/sec = 100kb/msec
On a millisecond scale, we can see that the link is often bursting over the
line rate. These packets will be dropped and recorded as an output drop on
the egress interface of the switch.
Note:
1Gb/sec = 1Mb/msec
As a summary, output drops are not due to hardware or software failures but
due to traffic. If the traffic causing the drops is not expected then we
need to troubleshoot why those packets are reaching the interface (spanning
tree loop, unicast flooding, routing mis-configuration, etc). If the traffic
is expected then we need to increase the available medium (port-channel) so
the buffers are used less.
HTH
Regards
Inayath
*Plz rate all usefull posts.
04-08-2013 12:39 AM
Hi Inayath and thank you very much for your reply.
Since the tx rate is low I will assume that there is bursty traffic, but that is something that I will have to troubleshoot. I too was looking into maximizing the bandwidth with the use of portchannels, so probably that is the direction I will follow.
The truth is that we do have unicast flooding (due to the network layout), and unfortunately most of the more utilized vlans go out through the respective interfaces. I will also try to limit the vlans that are not needed on the specific blades!
Good to know that it is not a hardware or software problem
Thanks!
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide