cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
26822
Views
0
Helpful
25
Replies

Can't figure out output discards

paul amaral
Level 4
Level 4

Hi all, I have an interface that constantly has output drops, looking at the drops I can confirm they are all output discard. The problem im having is I can’t figure out what is causing the output packet drops.

 

The interface is part of a routed vlan and it set for 100Mb full duplex connected to an L2 switch set at 100/full as well. I already replaced the L2 switch and that didn’t make a difference. Also looking at the L2 switches’ interface there are not input drops or errors at all.

 

I have read a lot of documents on discards and have done as much troubleshooting as I can and have not been able to stop this from happening or even determine what packets are being dropped.

I have looked for microbursts using wireshark and there are none even at 2Mb you will see discard sometimes. I have increase the output queue to match the input queue and that didn’t help. The interface looks clean with no CRC/Runts etc and I’m barely touching the 100Mb throughput. Can someone recommend what else I can do and look at to determined what the issue might be.

Mod Ports Card Type                              Model              Serial No.
--- ----- -------------------------------------- ------------------ -----------
  1   48  SFM-capable 48-port 10/100 Mbps RJ45   WS-X6548-RJ-45     SAL0710A54G
  2   48  SFM-capable 48-port 10/100 Mbps RJ45   WS-X6548-RJ-45     SAL09444KVM

  7    2  Supervisor Engine 720 (Active)         WS-SUP720-3BXL     SAD084202LK
  8    2  Supervisor Engine 720 (Hot)            WS-SUP720-3BXL     SAL1015JPRZ

 

TIA, Paul

 

Vlan615 is up, line protocol is up

  Hardware is EtherSVI, address is 0015.c7c7.0880 (bia 0015.c7c7.0880)

  Internet address is xxxx

  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,

     reliability 255/255, txload 1/255, rxload 1/255

  Encapsulation ARPA, loopback not set

  Keepalive not supported

  ARP type: ARPA, ARP Timeout 04:00:00

  Last input 00:00:22, output 00:00:22, output hang never

  Last clearing of "show interface" counters 01:49:53

  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

  Queueing strategy: fifo

  Output queue: 0/40 (size/max)

  5 minute input rate 1337000 bits/sec, 349 packets/sec

  5 minute output rate 1938000 bits/sec, 344 packets/sec

  L2 Switched: ucast: 214 pkt, 14696 bytes - mcast: 12 pkt, 768 bytes

  L3 in Switched: ucast: 1582724 pkt, 567827172 bytes - mcast: 0 pkt, 0 bytes mcast

  L3 out Switched: ucast: 1625654 pkt, 1115273430 bytes mcast: 0 pkt, 0 bytes

     1584871 packets input, 568106584 bytes, 0 no buffer

     Received 12 broadcasts (0 IP multicasts)

     0 runts, 0 giants, 0 throttles

     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

     1623842 packets output, 1114306193 bytes, 0 underruns

     0 output errors, 0 interface resets

     0 output buffer failures, 0 output buffers swapped out

 

FastEthernet1/36 is up, line protocol is up (connected)

  Hardware is C6k 100Mb 802.3, address is 0009.11f6.35b3 (bia 0009.11f6.35b3)

  Description: Spamcan new port - pa testing

  MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,

     reliability 255/255, txload 2/255, rxload 2/255

  Encapsulation ARPA, loopback not set

  Keepalive set (10 sec)

  Full-duplex, 100Mb/s, media type is 10/100BaseTX

  input flow-control is off, output flow-control is unsupported

  ARP type: ARPA, ARP Timeout 04:00:00

  Last input 00:00:25, output never, output hang never

  Last clearing of "show interface" counters 01:50:32

  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 2468

  Queueing strategy: fifo

  Output queue: 0/40 (size/max)

  30 second input rate 942000 bits/sec, 289 packets/sec

  30 second output rate 1097000 bits/sec, 271 packets/sec

     1592429 packets input, 569790617 bytes, 0 no buffer

     Received 3434 broadcasts (3422 multicasts)

     0 runts, 0 giants, 0 throttles

     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

     0 watchdog, 0 multicast, 0 pause input

     0 input packets with dribble condition detected

     1626638 packets output, 1105460469 bytes, 0 underruns

     0 output errors, 0 collisions, 0 interface resets

     0 babbles, 0 late collision, 0 deferred

     0 lost carrier, 0 no carrier, 0 PAUSE output

     0 output buffer failures, 0 output buffers swapped out

 

sh int fast1/36 counters error

 

Port                     Align-Err             FCS-Err            Xmit-Err             Rcv-Err

    UnderSize         OutDiscards

Fa1/36                           0                   0                   0                   0

            0                2468

 

Port                    Single-Col           Multi-Col            Late-Col          Excess-Col

    Carri-Sen               Runts              Giants

Fa1/36                           0                   0                   0                   0

            0                   0                   0

 

Port                   SQETest-Err         Deferred-Tx        IntMacTx-Err        IntMacRx-Err

   Symbol-Err

Fa1/36                           0                   0                   0                   0

            0

 

Interface FastEthernet1/36 queueing strategy:  Weighted Round-Robin

  Port QoS is enabled

Trust boundary disabled

 

  Port is untrusted

  Extend trust state: not trusted [COS = 0]

  Default COS is 0

    Queueing Mode In Tx direction: mode-cos

    Transmit queues [type = 1p3q1t]:

    Queue Id    Scheduling  Num of thresholds

    -----------------------------------------

       1         WRR                 1

       2         WRR                 1

       3         WRR                 1

       4         Priority            1

 

    WRR bandwidth ratios:  100[queue 1] 150[queue 2] 200[queue 3]

 

    queue random-detect-min-thresholds

    ----------------------------------

      1    70[1]

      2    70[1]

      3    70[1]

 

    queue random-detect-max-thresholds

    ----------------------------------

      1    100[1]

      2    100[1]

      3    100[1]

 

    WRED disabled queues:

 

    queue thresh cos-map

    ---------------------------------------

    1     1      0 1

    2     1      2 3 4

    3     1      6 7

    4     1      5

25 Replies 25

Leo Laohoo
Hall of Fame
Hall of Fame

The output doesn't really help because it is taken from a Virtual Interface.  I'm keen to know which PHYSICAL interface the output drops are coming from. 

Leo I have pasted the physical interface stats, FastEthernet1/36, look at the original mesg.

thanks, p

Output discards on a FastEthernet port???  What client is connected to this port?  

yes on fast ether port. set for full/100. Right now there is a cisco catalyst set for full/100 with no errors. its an L2 switch with about 5 linux smtp servers.

I just can't figure out what is causing the output drops on the 6500 side.

paul

The 6548 line cards are no match to the Linux servers.  The 6548 line cards are not meant to "live" or "do" data centre work.  

so what do you think is happening, because we have other servers/switches connected to the 6548 line card that sees no output drops.

also the interface is connected to another L2 switch which sees no output drops. The servers are not connected directly to the 6548 linecard.

What I think is happening is each of the Linux server is overwhelming the notoriously shallow memory buffer of the 6548 line cards. 

The solution to this are: 

1.  Apply QoS; or

2.  6748 line cards.

Unfortunately, QoS is not my strong suits.  One of the Cisco VIPs, Joseph, is good at it and if he's not busy he normally lurks around and he'll chime in.

is there a way to see memory buffers on those cards? also shouldn't i see drops across all ports on that card? I'm only seeing it for some ports, the busiest ports.

The other thing is the L2 switch connected to it is an old cisco catalyst 2900 and there are no output drops. Could that line card be really that bad ?

You can use show buffers input-interface -interface x/x- packet/header command. however, you may be hitting this bug if you are only seeing the drops increase but there is no actual network degradation:

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCdz02952/?reffering_site=dumpcr

Hope it helps, best regards!

JC

The buffer command will not show any packets buffer info, i have tried that. Also the line card i have runs software 12.2 so its not affected by that bug. It's just really strange to have the output counter increment for now reason, well none that i can see.

Leo, i have done research am im not convinced that the issue is what your described above and its a problem with the hardware queue, thus also why setting the software queue didnt make a difference. I have mls qos enabled and its using the hardware queues on the linecards.

my question is the 6548 shows 1088KB for a TX queue buffer while the 6748 shows 1.2MB do you think that ~ and extra 300KB will make a difference ?

WS-X6748-GE-TX

48 x GE TX

1Q8T

1P3Q8T

1.3Mb

166KB

1.2MB

 

WS-X6548-RJ-45

48 x 10/100

1P1Q0T

1P3Q1T

1.1MB

28KB

1088KB

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

and extra 300KB will make a difference ?

Possibly, although a bigger difference using a 6748 might be being running at gig rather that 100.  (Does your 2900 support gig on a 2900 uplink port?)

Generally, unless you're bumping into some kind of bug, Cisco device queues drop packets when they overflow.  When overall utilization appears low, microbursts are a common cause, although you say you believe this isn't happening.  (How did you obtain the packet dump you used?  If you used Cisco's SPAN, I'm unsure its replication would be accurate enough to represent what the egress port hardware is actually "seeing" frame/packet timing wise.)

As you mention you're using MLS QoS, it's possible, some tuning of the interface's QoS settings (e.g. wrr-queue queue-limit and/or wrr-queue bandwidth) might remediate your drops.  (I don't recall how a 6548 shares buffers between its queues.)

also shouldn't i see drops across all ports on that card? I'm only seeing it for some ports, the busiest ports.

Since on this particular card, buffers are allocated per port, what you state makes sense, i.e. you only see drops on the busiest ports.

Joseph, it was set to 1000 using an HP switch before i switched it to the 2900 at 100. The thing is i never seeing input errors on the 2900 or HP switch.  Do you think going with the WS-X6748-GE-TX would make a difference, im not 100% sure.

I used span and wireshark IO graph to look at the bandwidth used in real time. I was seeing output drops at 3 megs/5 megs and never saw any indication of microbursts. although I'm assuming I should be seeing spikes near the 100Meg limit which im not seeing anything near that. 

I tweaked the WRR queue bandwidth, dont have queue limit option,  several times with no affect, although i have mls qos enabled and dscp trust on that port most packets destined for this port are smtp traffic with low level qos marking and the packets never take advantage of the priority queue and are dropped on the 1st queue.

  Packets dropped on Transmit:
    BPDU packets:  0
    que thr            dropped          30-s bytes          peak bytes 5-mins avg bps   peak bps  [co
s-map]
    -------------------------------------------------------------------------------------------------
-----

    1    1                1256                   0                   0            0            0   [0
 1 ]

thanks, paul

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Joseph, it was set to 1000 using an HP switch before i switched it to the 2900 at 100. The thing is i never seeing input errors on the 2900 or HP switch.  Do you think going with the WS-X6748-GE-TX would make a difference, im not 100% sure.

I wouldn't expect egress drops to cause input error unless the receiving device couldn't deal with 100 Mbps.

Again, a 6748 might help, because of the extra 300K, because if you can run at gig rather than 100, because it should support the queue-limit command too.

What you might try, if you don't have a 6748 port at hand, is a sup port.  The sup720's ports aren't the best for uplinks, but it might be interesting to see if your results vary.

I used span and wireshark IO graph to look at the bandwidth used in real time. I was seeing output drops at 3 megs/5 megs and never saw any indication of microbursts. although I'm assuming I should be seeing spikes near the 100Meg limit which im not seeing anything near that. 

Again, unsure SPAN doesn't distort timings.  What port were you SPANing, the egress port?

Could more than one port be sending traffic to the egress port?  If so, consider just two concurrent 100 Mbps streams would be sending 200 Mbps of traffic to a 100 Mbps port.  If that happens, the question is whether the concurrent traffic will exceed the allocated buffer space.  If the whole MB of buffer space was available, one would expect it to often be ample, but if the interface reserves it for egress queues, and if you cannot adjust them, then you're more likely to overflow an individual queue's buffers.

Review Cisco Networking for a $25 gift card