cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements
Join Customer Connection to register!
1859
Views
10
Helpful
15
Replies
brettp
Beginner

Output drops -- link is not saturated, is it really due to bursts?

I have a switch that shows a lot of drops on just about all interfaces that have a device connected to them (VMhosts with lots of servers.) The main interface in question is an uplink to an ASA, essentially the gateway for those devices on the switch, which shows many many many more drops (which could be expected.) That interface, full duplex 1 Gbps. If I sit there and run show int every few seconds, the number essentially continually increments, but the input rate (set to 30 seconds, so I'm kind of just guessing) doesn't add up. It's never anywhere near a gig. I never see anything in the output queue either. I use PRTG to monitor traffic and that interface never shows more than around 150 Mbps. I understand the polling interval may prevent me from seeing bursts, or maybe not. Are bursts really creating all of these drops? But it's nearly constant... I would expect to see something somewhere explaining these drops. It's also perplexing that the other interfaces show many drops as well. Can this be a hardware issue with the switch? There are no errors, nothing... I find the burst theory hard to believe because all of these servers would not be generating that much traffic at any given time. People would be pulling files through these links, so a large amount of data can make sense, but why would TCP allow it if it was literally one connection (one client to one server?) Any insight is appreciated. Thanks!

15 REPLIES 15
STEPAN JANKOVIC
Beginner

Hello Brettp :-)

There are two options:

1) counters are wrong. It may be bug. Use "cisco bug searc tool" https://bst.cloudapps.cisco.com/bugsearch/?referring_site=mm , enter something like "3650 output drops" (or what kind of switch you have..)

2) counters are true. As you stated, that you normally don't have too much traffic, there are two possibilities: 2a) switch has troubles with buffers (what type is it?) 2b) As you stated that output drops are seen on many interfaces it may be some intermittent traffic storm. This can be dignosed by capturing sample of traffic (notebook with wireshark) on these ports and looking if it is all legitimate trafic. There may be some parasitic broadcast, multicast or unknown unicast.

Hope that helps :-)

Stepan

Thank you for the response.  Queuing is just FIFO. I checked for possible bugs regarding inaccurate output drops, but there doesn't seem to be any. I ran wireshark and observed the multicast/broadcast traffic and I saw nothing out of the ordinary. I suppose the only option left is bursts, but I really find that hard to believe... hmm...

Can you copy/paste please the output of the following command:

show interface "interface" summary 

show queueing interface "interface"

 

Thanks,

Calin

Thank you for the reply. I have included the show int output as well as the output you requested. I  just cleared the counters before running the commands to illustrate there were 18 drops already soon after I cleared them...  And based on the output, it just looks like the link is saturated, being the drops are from the output queue... so it would have to be a burst, since the link isn't generally saturated?

 

SWITCH#show queueing int g1/0/1
Interface GigabitEthernet1/0/1 queueing strategy: none

SWITCH#show int g1/0/1
GigabitEthernet1/0/1 is up, line protocol is up (connected)
  Hardware is Gigabit Ethernet, address is b4a4.e3ab.4181 (bia b4a4.e3ab.4181)
  Description: Uplink to Primary XOASA
  MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 8/255, rxload 6/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
  input flow-control is off, output flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output 00:00:01, output hang never
  Last clearing of "show interface" counters 00:01:43
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 18
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  30 second input rate 26380000 bits/sec, 3564 packets/sec
  30 second output rate 34062000 bits/sec, 5059 packets/sec
     389531 packets input, 360361208 bytes, 0 no buffer
     Received 65 broadcasts (0 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 0 multicast, 0 pause input
     0 input packets with dribble condition detected
     535531 packets output, 442964566 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

SWITCH#show int g1/0/1 summary

 *: interface is up
 IHQ: pkts in input hold queue     IQD: pkts dropped from input queue
 OHQ: pkts in output hold queue    OQD: pkts dropped from output queue
 RXBS: rx rate (bits/sec)          RXPS: rx rate (pkts/sec)
 TXBS: tx rate (bits/sec)          TXPS: tx rate (pkts/sec)
 TRTL: throttle count

  Interface                   IHQ       IQD       OHQ       OQD      RXBS      RXPS      TXBS      TXPS      TRTL
-----------------------------------------------------------------------------------------------------------------
* GigabitEthernet1/0/1          0         0         0        18  26210000      3567  35061000      5133         0                     

 

I replied but it seems like it disappeared? Here is the output you requested (as well as show int to illustrated 18 drops shortly after I cleared the counters.) Based on what I'm seeing, the packets are just dropping from the output queue but based on the numbers, the link is nowhere near saturated... thus indicating a burst... which in my opinion is not possible based on the traffic traversing the link. I have also included a screenshot showing traffic graph...

 

Screen Shot 2018-07-26 at 12.29.58 PM.png

 

SWITCH#show queueing int g1/0/1
Interface GigabitEthernet1/0/1 queueing strategy: none

SWITCH#show int g1/0/1
GigabitEthernet1/0/1 is up, line protocol is up (connected)
  Hardware is Gigabit Ethernet, address is b4a4.e3ab.4181 (bia b4a4.e3ab.4181)
  Description: Uplink to Primary XOASA
  MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 8/255, rxload 6/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
  input flow-control is off, output flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output 00:00:01, output hang never
  Last clearing of "show interface" counters 00:01:43
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 18
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  30 second input rate 26380000 bits/sec, 3564 packets/sec
  30 second output rate 34062000 bits/sec, 5059 packets/sec
     389531 packets input, 360361208 bytes, 0 no buffer
     Received 65 broadcasts (0 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 0 multicast, 0 pause input
     0 input packets with dribble condition detected
     535531 packets output, 442964566 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

SWITCH#show int g1/0/1 summary

 *: interface is up
 IHQ: pkts in input hold queue     IQD: pkts dropped from input queue
 OHQ: pkts in output hold queue    OQD: pkts dropped from output queue
 RXBS: rx rate (bits/sec)          RXPS: rx rate (pkts/sec)
 TXBS: tx rate (bits/sec)          TXPS: tx rate (pkts/sec)
 TRTL: throttle count

  Interface                   IHQ       IQD       OHQ       OQD      RXBS      RXPS      TXBS      TXPS      TRTL
-----------------------------------------------------------------------------------------------------------------
* GigabitEthernet1/0/1          0         0         0        18  26210000      3567  35061000      5133         0                     

All looks OK...

Some idea mentions microburst, but how much microburst can you have on an interface that is using this amount of capacity?

How's the load in general per switch?

 

Calin

The switch "appears" fine except for the drops. CPU, memory, load... it all checks out. I'm thinking the only answer is 1. bursty traffic (which I will have to investigate. Unfortunately, I can't get stats down to the millisecond... however I see there's a way to possibly detect it with Wireshark. I'll have to read about that) or the switch is defective in some way -- which is possible, but the behavior doesn't typically point to a hardware issue from what I can tell. It's not the cabling or the actual physical switch port. As hard as it is for me to believe, I am leaning towards bursts... just very strange in my opinion. 

If you have a possibility to open a TAC, do it. I'm curious about the outcome.

Joseph W. Doherty
Hall of Fame Expert

". . . but why would TCP allow it if it was literally one connection (one client to one server?) Any insight is appreciated."

Are you familiar with TCP "slow start"?

Of course, understand, if you only have one flow, and its input is the same bandwidth as the output, although the source host will transmit in bursts, those bursts will not queue on egress. However, if you have multiple ingress to one egress or an ingress with greater bandwidth capacity than the egress, the above can "micro burst". Interestingly, the micro burst drops can keep the link from being saturated, as TCP senders will slow their transmission rate when the drops are detected.

So I read up on the Cisco site how to detect microbursts with Wireshark, and sure enough, that seems to be the culprit. It's amazing what you can dig up -- I was almost positive that wasn't the case. I'd like to mitigate these drops if possible... I'm currently using a C2960S... Beyond the buffers, the switch is performing fine. I know one suggestion could be the fine-tune the buffers... but I'm wondering if maybe getting another similar catalyst switch might be a better option. Increasing bandwidth is not an option. According to the site, the 2960S has 2MB egress buffers, the 2960X has 4MB.  If my calculations are correct... During these bursts, I'm seeing about 100,000 bits per .001 second above the limit. I'd imagine doubling the buffers would be adequate to correct the issue... maybe... Do you have an input? I'm new to this...

It's hard to predict what doubling the buffers will do exactly, although any additional buffer allocation will likely mitigate microburst drops. Even with 4 MB of RAM you may have drops as the 3560/3750 series have 4 MB per 24 copper ports and they are also a bit infamous for output drop unless you tune the buffers.

What I've found often helps is buffer tuning such that the switch adopts the buffering architecture of the earlier 3K switches that worked from a common pool. The issue with the later series, they reserve buffer RAM for interface egress queues, even when it's not used. The idea being egress queues will always have some buffer RAM. This might might sense if all your interfaces are equally busy, but if not, if then restricts buffer RAM from busy interfaces.

Thanks for the input. I think I'll grab a 2960-X switch and hope for the best. Obviously, it's a much better switch than the 2960-S, so I can't imagine it would make the issue any worse and I'll have a new switch!

The -X shouldn't perform worst than the -S, but again, don't be surprised, without buffer tuning, that you don't see a huge output drop decrease.

Is it even possible to tune the buffers if I’m just using fifo queuing on this switch? Any articles I dig up are strictly for when QOS is implemented. Thanks.