04-22-2016 12:37 PM - edited 03-05-2019 03:52 AM
Hi all, I have an interface that constantly has output drops, looking at the drops I can confirm they are all output discard. The problem im having is I can’t figure out what is causing the output packet drops.
The interface is part of a routed vlan and it set for 100Mb full duplex connected to an L2 switch set at 100/full as well. I already replaced the L2 switch and that didn’t make a difference. Also looking at the L2 switches’ interface there are not input drops or errors at all.
I have read a lot of documents on discards and have done as much troubleshooting as I can and have not been able to stop this from happening or even determine what packets are being dropped.
I have looked for microbursts using wireshark and there are none even at 2Mb you will see discard sometimes. I have increase the output queue to match the input queue and that didn’t help. The interface looks clean with no CRC/Runts etc and I’m barely touching the 100Mb throughput. Can someone recommend what else I can do and look at to determined what the issue might be.
Mod Ports Card Type Model Serial No.
--- ----- -------------------------------------- ------------------ -----------
1 48 SFM-capable 48-port 10/100 Mbps RJ45 WS-X6548-RJ-45 SAL0710A54G
2 48 SFM-capable 48-port 10/100 Mbps RJ45 WS-X6548-RJ-45 SAL09444KVM
7 2 Supervisor Engine 720 (Active) WS-SUP720-3BXL SAD084202LK
8 2 Supervisor Engine 720 (Hot) WS-SUP720-3BXL SAL1015JPRZ
TIA, Paul
Vlan615 is up, line protocol is up
Hardware is EtherSVI, address is 0015.c7c7.0880 (bia 0015.c7c7.0880)
Internet address is xxxx
MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive not supported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:22, output 00:00:22, output hang never
Last clearing of "show interface" counters 01:49:53
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 1337000 bits/sec, 349 packets/sec
5 minute output rate 1938000 bits/sec, 344 packets/sec
L2 Switched: ucast: 214 pkt, 14696 bytes - mcast: 12 pkt, 768 bytes
L3 in Switched: ucast: 1582724 pkt, 567827172 bytes - mcast: 0 pkt, 0 bytes mcast
L3 out Switched: ucast: 1625654 pkt, 1115273430 bytes mcast: 0 pkt, 0 bytes
1584871 packets input, 568106584 bytes, 0 no buffer
Received 12 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
1623842 packets output, 1114306193 bytes, 0 underruns
0 output errors, 0 interface resets
0 output buffer failures, 0 output buffers swapped out
FastEthernet1/36 is up, line protocol is up (connected)
Hardware is C6k 100Mb 802.3, address is 0009.11f6.35b3 (bia 0009.11f6.35b3)
Description: Spamcan new port - pa testing
MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,
reliability 255/255, txload 2/255, rxload 2/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, media type is 10/100BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:25, output never, output hang never
Last clearing of "show interface" counters 01:50:32
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 2468
Queueing strategy: fifo
Output queue: 0/40 (size/max)
30 second input rate 942000 bits/sec, 289 packets/sec
30 second output rate 1097000 bits/sec, 271 packets/sec
1592429 packets input, 569790617 bytes, 0 no buffer
Received 3434 broadcasts (3422 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
0 input packets with dribble condition detected
1626638 packets output, 1105460469 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
sh int fast1/36 counters error
Port Align-Err FCS-Err Xmit-Err Rcv-Err
UnderSize OutDiscards
Fa1/36 0 0 0 0
0 2468
Port Single-Col Multi-Col Late-Col Excess-Col
Carri-Sen Runts Giants
Fa1/36 0 0 0 0
0 0 0
Port SQETest-Err Deferred-Tx IntMacTx-Err IntMacRx-Err
Symbol-Err
Fa1/36 0 0 0 0
0
Interface FastEthernet1/36 queueing strategy: Weighted Round-Robin
Port QoS is enabled
Trust boundary disabled
Port is untrusted
Extend trust state: not trusted [COS = 0]
Default COS is 0
Queueing Mode In Tx direction: mode-cos
Transmit queues [type = 1p3q1t]:
Queue Id Scheduling Num of thresholds
-----------------------------------------
1 WRR 1
2 WRR 1
3 WRR 1
4 Priority 1
WRR bandwidth ratios: 100[queue 1] 150[queue 2] 200[queue 3]
queue random-detect-min-thresholds
----------------------------------
1 70[1]
2 70[1]
3 70[1]
queue random-detect-max-thresholds
----------------------------------
1 100[1]
2 100[1]
3 100[1]
WRED disabled queues:
queue thresh cos-map
---------------------------------------
1 1 0 1
2 1 2 3 4
3 1 6 7
4 1 5
04-24-2016 01:46 AM
The output doesn't really help because it is taken from a Virtual Interface. I'm keen to know which PHYSICAL interface the output drops are coming from.
04-25-2016 07:55 AM
Leo I have pasted the physical interface stats, FastEthernet1/36, look at the original mesg.
thanks, p
04-25-2016 11:26 AM
Output discards on a FastEthernet port??? What client is connected to this port?
04-25-2016 12:41 PM
yes on fast ether port. set for full/100. Right now there is a cisco catalyst set for full/100 with no errors. its an L2 switch with about 5 linux smtp servers.
I just can't figure out what is causing the output drops on the 6500 side.
paul
04-25-2016 01:18 PM
The 6548 line cards are no match to the Linux servers. The 6548 line cards are not meant to "live" or "do" data centre work.
04-25-2016 01:36 PM
so what do you think is happening, because we have other servers/switches connected to the 6548 line card that sees no output drops.
also the interface is connected to another L2 switch which sees no output drops. The servers are not connected directly to the 6548 linecard.
04-25-2016 01:44 PM
What I think is happening is each of the Linux server is overwhelming the notoriously shallow memory buffer of the 6548 line cards.
The solution to this are:
1. Apply QoS; or
2. 6748 line cards.
Unfortunately, QoS is not my strong suits. One of the Cisco VIPs, Joseph, is good at it and if he's not busy he normally lurks around and he'll chime in.
04-26-2016 06:56 AM
is there a way to see memory buffers on those cards? also shouldn't i see drops across all ports on that card? I'm only seeing it for some ports, the busiest ports.
The other thing is the L2 switch connected to it is an old cisco catalyst 2900 and there are no output drops. Could that line card be really that bad ?
04-26-2016 08:42 AM
You can use show buffers input-interface -interface x/x- packet/header command. however, you may be hitting this bug if you are only seeing the drops increase but there is no actual network degradation:
https://bst.cloudapps.cisco.com/bugsearch/bug/CSCdz02952/?reffering_site=dumpcr
Hope it helps, best regards!
JC
04-26-2016 08:58 AM
The buffer command will not show any packets buffer info, i have tried that. Also the line card i have runs software 12.2 so its not affected by that bug. It's just really strange to have the output counter increment for now reason, well none that i can see.
05-06-2016 08:52 AM
Leo, i have done research am im not convinced that the issue is what your described above and its a problem with the hardware queue, thus also why setting the software queue didnt make a difference. I have mls qos enabled and its using the hardware queues on the linecards.
my question is the 6548 shows 1088KB for a TX queue buffer while the 6748 shows 1.2MB do you think that ~ and extra 300KB will make a difference ?
WS-X6748-GE-TX |
48 x GE TX |
1Q8T |
1P3Q8T |
1.3Mb |
166KB |
1.2MB |
WS-X6548-RJ-45 |
48 x 10/100 |
1P1Q0T |
1P3Q1T |
1.1MB |
28KB |
1088KB |
05-09-2016 06:02 AM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
and extra 300KB will make a difference ?
Possibly, although a bigger difference using a 6748 might be being running at gig rather that 100. (Does your 2900 support gig on a 2900 uplink port?)
Generally, unless you're bumping into some kind of bug, Cisco device queues drop packets when they overflow. When overall utilization appears low, microbursts are a common cause, although you say you believe this isn't happening. (How did you obtain the packet dump you used? If you used Cisco's SPAN, I'm unsure its replication would be accurate enough to represent what the egress port hardware is actually "seeing" frame/packet timing wise.)
As you mention you're using MLS QoS, it's possible, some tuning of the interface's QoS settings (e.g. wrr-queue queue-limit and/or wrr-queue bandwidth) might remediate your drops. (I don't recall how a 6548 shares buffers between its queues.)
also shouldn't i see drops across all ports on that card? I'm only seeing it for some ports, the busiest ports.
Since on this particular card, buffers are allocated per port, what you state makes sense, i.e. you only see drops on the busiest ports.
05-09-2016 07:23 AM
Joseph, it was set to 1000 using an HP switch before i switched it to the 2900 at 100. The thing is i never seeing input errors on the 2900 or HP switch. Do you think going with the WS-X6748-GE-TX would make a difference, im not 100% sure.
I used span and wireshark IO graph to look at the bandwidth used in real time. I was seeing output drops at 3 megs/5 megs and never saw any indication of microbursts. although I'm assuming I should be seeing spikes near the 100Meg limit which im not seeing anything near that.
I tweaked the WRR queue bandwidth, dont have queue limit option, several times with no affect, although i have mls qos enabled and dscp trust on that port most packets destined for this port are smtp traffic with low level qos marking and the packets never take advantage of the priority queue and are dropped on the 1st queue.
Packets dropped on Transmit:
BPDU packets: 0
que thr dropped 30-s bytes peak bytes 5-mins avg bps peak bps [co
s-map]
-------------------------------------------------------------------------------------------------
-----
1 1 1256 0 0 0 0 [0
1 ]
thanks, paul
05-09-2016 07:59 AM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages wha2tsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
Joseph, it was set to 1000 using an HP switch before i switched it to the 2900 at 100. The thing is i never seeing input errors on the 2900 or HP switch. Do you think going with the WS-X6748-GE-TX would make a difference, im not 100% sure.
I wouldn't expect egress drops to cause input error unless the receiving device couldn't deal with 100 Mbps.
Again, a 6748 might help, because of the extra 300K, because if you can run at gig rather than 100, because it should support the queue-limit command too.
What you might try, if you don't have a 6748 port at hand, is a sup port. The sup720's ports aren't the best for uplinks, but it might be interesting to see if your results vary.
I used span and wireshark IO graph to look at the bandwidth used in real time. I was seeing output drops at 3 megs/5 megs and never saw any indication of microbursts. although I'm assuming I should be seeing spikes near the 100Meg limit which im not seeing anything near that.
Again, unsure SPAN doesn't distort timings. What port were you SPANing, the egress port?
Could more than one port be sending traffic to the egress port? If so, consider just two concurrent 100 Mbps streams would be sending 200 Mbps of traffic to a 100 Mbps port. If that happens, the question is whether the concurrent traffic will exceed the allocated buffer space. If the whole MB of buffer space was available, one would expect it to often be ample, but if the interface reserves it for egress queues, and if you cannot adjust them, then you're more likely to overflow an individual queue's buffers.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide