cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
118713
Views
56
Helpful
48
Replies

Catalyst 3850 high Total output drops and output errors

Antony Pasteris
Level 1
Level 1

We have put in service a catalyst 3850-12XS running ver 03.07.03E and we have noted that in certain ports, there is high output drops.  Auto QOS is configured on this switch and we have tried removing the qos config on the ports having problem but it didn't change anything.  From a performance point of view, at the moment the switch is running without problems... there is no network outage.  

TenGigabitEthernet1/0/1 is up, line protocol is up (connected)
Hardware is Ten Gigabit Ethernet, address is 00cc.fc68.f681 (bia 00cc.fc68.f681)
Description: VLAN 599 XXXXXXXX
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 128/255, txload 16/255, rxload 10/255
Encapsulation ARPA, loopback not set
Keepalive not set
Full-duplex, 1000Mb/s, link type is auto, media type is 10/100/1000BaseTX SFP
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:19, output never, output hang never
Last clearing of "show interface" counters 10:12:51
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 249067352
Queueing strategy: Class-based queueing
Output queue: 0/40 (size/max)
5 minute input rate 41672000 bits/sec, 6894 packets/sec
5 minute output rate 65358000 bits/sec, 8267 packets/sec
69357766 packets input, 54278801831 bytes, 0 no buffer
Received 1362 broadcasts (1226 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 1226 multicast, 0 pause input
0 input packets with dribble condition detected
97533479 packets output, 108964531997 bytes, 0 underruns
249067352 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

As you can see from the ping below that's sending traffic through the link described above, there seem to be no connectivity issues apart from the fact that the output drops counter indicate something completely different. 

XXXXXX#ping 4.2.2.1 repeat 1000 size 100
Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 4.2.2.1, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (1000/1000), round-trip min/avg/max = 10/14/30 ms

Could this be a bug ?   

2 Accepted Solutions

Accepted Solutions

Leo Laohoo
Hall of Fame
Hall of Fame

reliability 128/255, txload 16/255, rxload 10/255

It's a cable issue. 

249067352 output errors, 0 collisions, 0 interface resets

High amount of output errors point to a cable issue.

View solution in original post

Robert Hillcoat
Level 1
Level 1

Hi everyone, a document was released on the 30th of September 2016, with some in depth explanation on this issue. 

http://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/200594-Catalyst-3850-Troubleshooting-Output-dr.html

It has resolved the issues i had on the 3850 switch. 

View solution in original post

48 Replies 48

Leo Laohoo
Hall of Fame
Hall of Fame

reliability 128/255, txload 16/255, rxload 10/255

It's a cable issue. 

249067352 output errors, 0 collisions, 0 interface resets

High amount of output errors point to a cable issue.

Hello Leo,  I thought so at first, but this is not the case since before the installation of the 3850 there was a 3750X that has been running for the last 3 years and there have never been problems with the cabling.   The symptom is seen on ports having FO and FX SFP modules.  The migration to 3850 was a 1 to 1 where the 3750X was replaced with a 3850.

consider replacing the SFP module. seems like physical issue.

Hello Anuar,  we have already attempted replacing the SFP modules but it didn't change anything. 

Same problem with our switch , WS-C3850-48T    S/W 03.06.04.E 

Its a standalone (no stack), see number of errors 638604          

xxxxx#sh int gi1/0/48 counters errors

Port        Align-Err     FCS-Err    Xmit-Err     Rcv-Err  UnderSize  OutDiscards
Gi1/0/48            0           0      638604           0          0       638604

Port      Single-Col  Multi-Col   Late-Col  Excess-Col  Carri-Sen      Runts
Gi1/0/48           0          0          0           0          0          0
xxxxxxx#

xxxxxxx#sh int gi1/0/48       
GigabitEthernet1/0/48 is up, line protocol is up (connected)
  Hardware is Gigabit Ethernet, address is 008e.73ac.b971 (bia 008e.73ac.b971)
  Description: ## Router2 ##
  Internet address is x.x.x.x/30
  MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX
  input flow-control is off, output flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:00, output never, output hang never
  Last clearing of "show interface" counters 22:08:35
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 638604
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 27000 bits/sec, 18 packets/sec
  5 minute output rate 59000 bits/sec, 18 packets/sec
     2635562 packets input, 2002679603 bytes, 0 no buffer
     Received 134 broadcasts (0 IP multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 134 multicast, 0 pause input
     0 input packets with dribble condition detected
     2336830 packets output, 998183575 bytes, 0 underruns
     638604 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

See below from cisco website , but they also saying do not increase Output Queue size.

http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/12027-53.html

output errors Description: Cisco IOS sh interfaces counter. The sum of all errors that prevented the final transmission of datagrams out of the interface. Common Cause: This issue is due to the low Output Queue size.

At first we thought the problem was caused by QoS , but after several attempts of fine tuning, we didn't notice any big change and the drops were still there.  So far we still haven't got a solution from tac. 

I believe it doesn't make sense to claim that "oversubscription" is the problem since we are seeing 200M traffic on a 10G link. The only explaination  received so far is "Microbursts" ..

Troubleshoot Output Drops

Typically, the output drops will occur if QoS is configured and it is not providing enough bandwidth to certain class of packets. It also occurs when we are hitting oversubscription.

I had the same issue with a port channel and the interfaces attached to it.(See below) I opened a case with TAC and they were able to resolve the issue by changing the softmax qos value from 100 to 1200 and creating a policy map and applying that change to the Tengigabit interfaces in the port channel.

Changes I made to correct the issue in bold.

--------------------------------------------------------------

qos queue-softmax-multiplier 1200


class-map match-any TACTEST
 match access-group name TACTEST

policy-map TACTEST
 class class-default
  bandwidth percent 100



interface TenGigabitEthernet1/0/13
 description ITLESXI2 Eth1
 switchport trunk native vlan 150
 switchport trunk allowed vlan 3,110,111,150
 switchport mode trunk
 load-interval 30
 channel-group 3 mode on
 spanning-tree guard root
 service-policy output TACTEST


ip access-list extended TACTEST
 permit ip any any

--------------------------------------------------------------------------------

Port-channel3 is up, line protocol is up (connected)
  Hardware is EtherChannel, address is 042a.e2fe.a10d (bia 042a.e2fe.a10d)
  MTU 9198 bytes, BW 2000000 Kbit/sec, DLY 10 usec,
     reliability 161/255, txload 44/255, rxload 19/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, link type is auto, media type is
  input flow-control is off, output flow-control is unsupported
  Members in this channel: Te1/0/13 Te2/0/13
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output never, output hang never
  Last clearing of "show interface" counters 00:22:06
  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 3873968431
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 151997000 bits/sec, 5174 packets/sec
  5 minute output rate 350886000 bits/sec, 10290 packets/sec
     6630412 packets input, 24572245522 bytes, 0 no buffer
     Received 327 broadcasts (236 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 236 multicast, 0 pause input
     0 input packets with dribble condition detected
     13046420 packets output, 56033671803 bytes, 0 underruns
     3873968431 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

Now this is what I'm seeing after the changes. Very clean and the reliability is back to 255/255.

Port-channel3 is up, line protocol is up (connected)
  Hardware is EtherChannel, address is 042a.e2fe.a10d (bia 042a.e2fe.a10d)
  MTU 9198 bytes, BW 2000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 47/255, rxload 21/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, link type is auto, media type is
  input flow-control is off, output flow-control is unsupported
  Members in this channel: Te1/0/13 Te2/0/13
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output never, output hang never
  Last clearing of "show interface" counters 00:42:11
  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 166050000 bits/sec, 5423 packets/sec
  5 minute output rate 370633000 bits/sec, 10909 packets/sec
     13354300 packets input, 50822397908 bytes, 0 no buffer
     Received 647 broadcasts (467 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 467 multicast, 0 pause input
     0 input packets with dribble condition detected
     27159924 packets output, 115505462629 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

This did the trick for me, it started after upgrading to 03.06.06 only on my firewall ports.

thanks for sharing the qos command it resolved the issues i am having on 3850 switches.

I found the issue ... here is how i fixed it

1) Our link to WAN is 100Mbps.

2) But user ports are configured with 1000Mbps

3) Inter switch links (access to core) are also 1000Mbps ,

users shared drive is in our Data center (remote site) , when they try to copy large file from local pc to shared drive in DC , i see these drops / errors.

To fix the issue I changed the user port speed to 100/full from 1000/full.

I also noticed reliability was poor when traffic is high on interface see tx and reliability

     reliability 255/255, txload 1/255, rxload 35/255
  Last clearing of "show interface" counters 00:24:06
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 103496

103496 output errors, 0 collisions, 0 interface resets



     reliability 212/255, txload 54/255, rxload 28/255
  Last clearing of "show interface" counters 00:24:12
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 313524

313524 output errors, 0 collisions, 0 interface resets




     reliability 186/255, txload 108/255, rxload 21/255
  Last clearing of "show interface" counters 00:24:32
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 471812

471812 output errors, 0 collisions, 0 interface resets



     reliability 225/255, txload 173/255, rxload 12/255
  Last clearing of "show interface" counters 00:24:50
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 471812

471812 output errors, 0 collisions, 0 interface resets


     reliability 235/255, txload 189/255, rxload 10/255
  Last clearing of "show interface" counters 00:24:53
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 471812
471812 output errors, 0 collisions, 0 interface resets


     reliability 242/255, txload 162/255, rxload 11/255
  Last clearing of "show interface" counters 00:25:03
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 471812
471812 output errors, 0 collisions, 0 interface resets



we see the same issues on several interfaces with 100 Mbit connections (Cisco Phones, Printer Switch etc.) on a 3850 stack. We didn't see it with 3.03.05 but with 3.06.03 and 3.06.04.

Update:

With a lot of testing we find out that if we have a 10 Mbit or 100 Mbit connection we run into this issue

GER-DV5-STACK241#sh int gi2/0/23
GigabitEthernet2/0/23 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is bcf1.f2af.aa17 (bia bcf1.f2af.aa17)
MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:35, output never, output hang never
Last clearing of "show interface" counters 01:49:01
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 270067
Queueing strategy: Class-based queueing
Output queue: 0/200 (size/max)
5 minute input rate 3000 bits/sec, 2 packets/sec
5 minute output rate 190000 bits/sec, 106 packets/sec
21061 packets input, 3318700 bytes, 0 no buffer
Received 780 broadcasts (237 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 237 multicast, 0 pause input
0 input packets with dribble condition detected
678826 packets output, 154522799 bytes, 0 underruns
270067 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

Ralf

When you negotiate a lower bandwidth on the link the drops increase? 

I'm looking at a similar issue, could it be the queue buffers on the interface decrease on a linear basis to the negotiated interface speed? For example a 1gbps interface negotiating 100mbps has less than if it negotiated the full 1gbps. 

Either way it only surfaces when you go to a certain level of code? 

I think too much traffic going through these interfaces as they are 10 and 100.

try to find the source of the traffic (port may set to 1000) and restric them to use 10 or 100 when they are entering into the network , that way i think you will resolve this issue , see below.

HTH

No,

definitely not. I upgraded 2 more Stacks (without any issue before 3.3.2/3.3.1) to 3.6.5 (tried all version of 3.6.x so far) and the issue starts also on those devices. That happens on all ports with 100 and 10 MBit devices are connected. We are talking here about printers and phones. Some of the phones are standalone no pc connected and it happens too. I can ckeck the traffic with our monitoring system and I have never seen a lot of traffic if this issues occurs. Thats a bug!

Thanks Ralf

Review Cisco Networking for a $25 gift card