Solved: consider replacing the SFP

Antony Pasteris · ‎05-10-2016

We have put in service a catalyst 3850-12XS running ver 03.07.03E and we have noted that in certain ports, there is high output drops. Auto QOS is configured on this switch and we have tried removing the qos config on the ports having problem but it didn't change anything. From a performance point of view, at the moment the switch is running without problems... there is no network outage.

TenGigabitEthernet1/0/1 is up, line protocol is up (connected)
Hardware is Ten Gigabit Ethernet, address is 00cc.fc68.f681 (bia 00cc.fc68.f681)
Description: VLAN 599 XXXXXXXX
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 128/255, txload 16/255, rxload 10/255
Encapsulation ARPA, loopback not set
Keepalive not set
Full-duplex, 1000Mb/s, link type is auto, media type is 10/100/1000BaseTX SFP
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:19, output never, output hang never
Last clearing of "show interface" counters 10:12:51
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 249067352
Queueing strategy: Class-based queueing
Output queue: 0/40 (size/max)
5 minute input rate 41672000 bits/sec, 6894 packets/sec
5 minute output rate 65358000 bits/sec, 8267 packets/sec
69357766 packets input, 54278801831 bytes, 0 no buffer
Received 1362 broadcasts (1226 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 1226 multicast, 0 pause input
0 input packets with dribble condition detected
97533479 packets output, 108964531997 bytes, 0 underruns
249067352 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

As you can see from the ping below that's sending traffic through the link described above, there seem to be no connectivity issues apart from the fact that the output drops counter indicate something completely different.

XXXXXX#ping 4.2.2.1 repeat 1000 size 100
Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 4.2.2.1, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (1000/1000), round-trip min/avg/max = 10/14/30 ms

Could this be a bug ?

Leo Laohoo · ‎05-10-2016

reliability 128/255, txload 16/255, rxload 10/255

It's a cable issue.

249067352 output errors, 0 collisions, 0 interface resets

High amount of output errors point to a cable issue.

View solution in original post

Robert Hillcoat · ‎10-26-2016

Hi everyone, a document was released on the 30th of September 2016, with some in depth explanation on this issue.

http://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/200594-Catalyst-3850-Troubleshooting-Output-dr.html

It has resolved the issues i had on the 3850 switch.

View solution in original post

Leo Laohoo · ‎05-10-2016

reliability 128/255, txload 16/255, rxload 10/255

It's a cable issue.

249067352 output errors, 0 collisions, 0 interface resets

High amount of output errors point to a cable issue.

Antony Pasteris · ‎05-10-2016

Hello Leo, I thought so at first, but this is not the case since before the installation of the 3850 there was a 3750X that has been running for the last 3 years and there have never been problems with the cabling. The symptom is seen on ports having FO and FX SFP modules. The migration to 3850 was a 1 to 1 where the 3750X was replaced with a 3850.

Anuar Shahrin · ‎05-10-2016

consider replacing the SFP module. seems like physical issue.

Antony Pasteris · ‎05-10-2016

Hello Anuar, we have already attempted replacing the SFP modules but it didn't change anything.

ANANTH KUMAR RACHAKONDA · ‎08-09-2016

Same problem with our switch , WS-C3850-48T S/W 03.06.04.E

Its a standalone (no stack), see number of errors 638604

xxxxx#sh int gi1/0/48 counters errors

Port        Align-Err     FCS-Err    Xmit-Err     Rcv-Err UnderSize OutDiscards
Gi1/0/48            0           0      638604           0          0       638604

Port      Single-Col Multi-Col   Late-Col Excess-Col Carri-Sen      Runts
Gi1/0/48           0          0          0           0          0          0
xxxxxxx#

xxxxxxx#sh int gi1/0/48
GigabitEthernet1/0/48 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is 008e.73ac.b971 (bia 008e.73ac.b971)
Description: ## Router2 ##
Internet address is x.x.x.x/30
MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,
     reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output never, output hang never
Last clearing of "show interface" counters 22:08:35
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 638604
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 27000 bits/sec, 18 packets/sec
5 minute output rate 59000 bits/sec, 18 packets/sec
     2635562 packets input, 2002679603 bytes, 0 no buffer
     Received 134 broadcasts (0 IP multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 134 multicast, 0 pause input
     0 input packets with dribble condition detected
     2336830 packets output, 998183575 bytes, 0 underruns
     638604 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

See below from cisco website , but they also saying do not increase Output Queue size.

http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/12027-53.html

output errors

Description: Cisco IOS sh interfaces counter. The sum of all errors that prevented the final transmission of datagrams out of the interface. Common Cause: This issue is due to the low Output Queue size.

Antony Pasteris · ‎08-09-2016

At first we thought the problem was caused by QoS , but after several attempts of fine tuning, we didn't notice any big change and the drops were still there. So far we still haven't got a solution from tac.

I believe it doesn't make sense to claim that "oversubscription" is the problem since we are seeing 200M traffic on a 10G link. The only explaination received so far is "Microbursts" ..

Troubleshoot Output Drops

Typically, the output drops will occur if QoS is configured and it is not providing enough bandwidth to certain class of packets. It also occurs when we are hitting oversubscription.

mseanmiller · ‎10-26-2016

I had the same issue with a port channel and the interfaces attached to it.(See below) I opened a case with TAC and they were able to resolve the issue by changing the softmax qos value from 100 to 1200 and creating a policy map and applying that change to the Tengigabit interfaces in the port channel.

Changes I made to correct the issue in bold.

--------------------------------------------------------------

qos queue-softmax-multiplier 1200

class-map match-any TACTEST
match access-group name TACTEST

policy-map TACTEST
class class-default
bandwidth percent 100

interface TenGigabitEthernet1/0/13
description ITLESXI2 Eth1
switchport trunk native vlan 150
switchport trunk allowed vlan 3,110,111,150
switchport mode trunk
load-interval 30
channel-group 3 mode on
spanning-tree guard root
service-policy output TACTEST

ip access-list extended TACTEST
permit ip any any

--------------------------------------------------------------------------------

Port-channel3 is up, line protocol is up (connected)
Hardware is EtherChannel, address is 042a.e2fe.a10d (bia 042a.e2fe.a10d)
MTU 9198 bytes, BW 2000000 Kbit/sec, DLY 10 usec,
     reliability 161/255, txload 44/255, rxload 19/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, link type is auto, media type is
input flow-control is off, output flow-control is unsupported
Members in this channel: Te1/0/13 Te2/0/13
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output never, output hang never
Last clearing of "show interface" counters 00:22:06
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 3873968431
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 151997000 bits/sec, 5174 packets/sec
5 minute output rate 350886000 bits/sec, 10290 packets/sec
     6630412 packets input, 24572245522 bytes, 0 no buffer
     Received 327 broadcasts (236 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 236 multicast, 0 pause input
     0 input packets with dribble condition detected
     13046420 packets output, 56033671803 bytes, 0 underruns
     3873968431 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

Now this is what I'm seeing after the changes. Very clean and the reliability is back to 255/255.

Port-channel3 is up, line protocol is up (connected)
Hardware is EtherChannel, address is 042a.e2fe.a10d (bia 042a.e2fe.a10d)
MTU 9198 bytes, BW 2000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 47/255, rxload 21/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, link type is auto, media type is
input flow-control is off, output flow-control is unsupported
Members in this channel: Te1/0/13 Te2/0/13
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output never, output hang never
Last clearing of "show interface" counters 00:42:11
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 166050000 bits/sec, 5423 packets/sec
5 minute output rate 370633000 bits/sec, 10909 packets/sec
     13354300 packets input, 50822397908 bytes, 0 no buffer
     Received 647 broadcasts (467 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 467 multicast, 0 pause input
     0 input packets with dribble condition detected
     27159924 packets output, 115505462629 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

netops_cortera · ‎09-17-2017

This did the trick for me, it started after upgrading to 03.06.06 only on my firewall ports.

karampudi · ‎05-22-2020

thanks for sharing the qos command it resolved the issues i am having on 3850 switches.

ANANTH KUMAR RACHAKONDA · ‎08-09-2016

I found the issue ... here is how i fixed it

1) Our link to WAN is 100Mbps.

2) But user ports are configured with 1000Mbps

3) Inter switch links (access to core) are also 1000Mbps ,

users shared drive is in our Data center (remote site) , when they try to copy large file from local pc to shared drive in DC , i see these drops / errors.

To fix the issue I changed the user port speed to 100/full from 1000/full.

I also noticed reliability was poor when traffic is high on interface see tx and reliability

reliability 255/255, txload 1/255, rxload 35/255
Last clearing of "show interface" counters 00:24:06
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 103496

103496 output errors, 0 collisions, 0 interface resets

reliability 212/255, txload 54/255, rxload 28/255
Last clearing of "show interface" counters 00:24:12
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 313524

313524 output errors, 0 collisions, 0 interface resets

reliability 186/255, txload 108/255, rxload 21/255
Last clearing of "show interface" counters 00:24:32
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 471812

471812 output errors, 0 collisions, 0 interface resets

reliability 225/255, txload 173/255, rxload 12/255
Last clearing of "show interface" counters 00:24:50
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 471812

471812 output errors, 0 collisions, 0 interface resets

reliability 235/255, txload 189/255, rxload 10/255
Last clearing of "show interface" counters 00:24:53
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 471812
471812 output errors, 0 collisions, 0 interface resets

reliability 242/255, txload 162/255, rxload 11/255
Last clearing of "show interface" counters 00:25:03
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 471812
471812 output errors, 0 collisions, 0 interface resets

Ralph Laemmermeyer · ‎05-12-2016

we see the same issues on several interfaces with 100 Mbit connections (Cisco Phones, Printer Switch etc.) on a 3850 stack. We didn't see it with 3.03.05 but with 3.06.03 and 3.06.04.

Update:

With a lot of testing we find out that if we have a 10 Mbit or 100 Mbit connection we run into this issue

GER-DV5-STACK241#sh int gi2/0/23
GigabitEthernet2/0/23 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is bcf1.f2af.aa17 (bia bcf1.f2af.aa17)
MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:35, output never, output hang never
Last clearing of "show interface" counters 01:49:01
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 270067
Queueing strategy: Class-based queueing
Output queue: 0/200 (size/max)
5 minute input rate 3000 bits/sec, 2 packets/sec
5 minute output rate 190000 bits/sec, 106 packets/sec
21061 packets input, 3318700 bytes, 0 no buffer
Received 780 broadcasts (237 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 237 multicast, 0 pause input
0 input packets with dribble condition detected
678826 packets output, 154522799 bytes, 0 underruns
270067 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

Ralf

Robert Hillcoat · ‎06-27-2016

When you negotiate a lower bandwidth on the link the drops increase?

I'm looking at a similar issue, could it be the queue buffers on the interface decrease on a linear basis to the negotiated interface speed? For example a 1gbps interface negotiating 100mbps has less than if it negotiated the full 1gbps.

Either way it only surfaces when you go to a certain level of code?

ANANTH KUMAR RACHAKONDA · ‎08-09-2016

I think too much traffic going through these interfaces as they are 10 and 100.

try to find the source of the traffic (port may set to 1000) and restric them to use 10 or 100 when they are entering into the network , that way i think you will resolve this issue , see below.

HTH

Ralph Laemmermeyer · ‎10-12-2016

No,

definitely not. I upgraded 2 more Stacks (without any issue before 3.3.2/3.3.1) to 3.6.5 (tried all version of 3.6.x so far) and the issue starts also on those devices. That happens on all ports with 100 and 10 MBit devices are connected. We are talking here about printers and phones. Some of the phones are standalone no pc connected and it happens too. I can ckeck the traffic with our monitoring system and I have never seen a lot of traffic if this issues occurs. Thats a bug!

Thanks Ralf

Catalyst 3850 high Total output drops and output errors

Troubleshoot Output Drops