cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
118627
Views
56
Helpful
48
Replies

Catalyst 3850 high Total output drops and output errors

Antony Pasteris
Level 1
Level 1

We have put in service a catalyst 3850-12XS running ver 03.07.03E and we have noted that in certain ports, there is high output drops.  Auto QOS is configured on this switch and we have tried removing the qos config on the ports having problem but it didn't change anything.  From a performance point of view, at the moment the switch is running without problems... there is no network outage.  

TenGigabitEthernet1/0/1 is up, line protocol is up (connected)
Hardware is Ten Gigabit Ethernet, address is 00cc.fc68.f681 (bia 00cc.fc68.f681)
Description: VLAN 599 XXXXXXXX
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 128/255, txload 16/255, rxload 10/255
Encapsulation ARPA, loopback not set
Keepalive not set
Full-duplex, 1000Mb/s, link type is auto, media type is 10/100/1000BaseTX SFP
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:19, output never, output hang never
Last clearing of "show interface" counters 10:12:51
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 249067352
Queueing strategy: Class-based queueing
Output queue: 0/40 (size/max)
5 minute input rate 41672000 bits/sec, 6894 packets/sec
5 minute output rate 65358000 bits/sec, 8267 packets/sec
69357766 packets input, 54278801831 bytes, 0 no buffer
Received 1362 broadcasts (1226 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 1226 multicast, 0 pause input
0 input packets with dribble condition detected
97533479 packets output, 108964531997 bytes, 0 underruns
249067352 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

As you can see from the ping below that's sending traffic through the link described above, there seem to be no connectivity issues apart from the fact that the output drops counter indicate something completely different. 

XXXXXX#ping 4.2.2.1 repeat 1000 size 100
Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 4.2.2.1, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (1000/1000), round-trip min/avg/max = 10/14/30 ms

Could this be a bug ?   

48 Replies 48

We see the exact same issue...  I can also say it still shows up on version 3.7.4

but this is not the case since before the installation of the 3850 there was a 3750X that has been running for the last 3 years and there have never been problems with the cabling.  

Don't get me wrong here, I'm not saying you aren't being "professional" but this is the inherit "risk" of using copper lines (versus fibre optic).  It may have worked in the past but pulling the cable out could've jarred one of the pins loose.

Sorry Leo ..  thats not what I intended , I was just pointing out some of the troubleshooting steps that we carried out :)

The same output errors are seen on fibre optic links.  

We have two 3850 in a stack and the problem is seen on the port forwarding most traffic on the etherchannel.  For example, the ports te1/0/1 and te2/0/1 are bundled to the same 802.3d etherchannel. Te1/0/1 showed the symptom hence we replaced the SFP but this didn't make a difference.  We went ahead and unplugged the link te1/0/1 so that traffic could go through the te2/0/1 which didn't have any drops before.  As soon as we did this, we started seeing output drops on the link te2/0/1.

   

Sorry Leo ..  thats not what I intended , I was just pointing out some of the troubleshooting steps that we carried out :)

Apologies not required.  No harm done.  

The same output errors are seen on fibre optic links.

What outputs?  Post the complete output to the command "sh controller e <BLAH>".  I want to see if the errors are incoming or outgoing packets.

I have uploaded the output of the following command :

XXXXXX#sh controllers ethernet-controller te1/0/1

As far as I'm aware, I do not see anything wrong with the output.  No line errors.

Hi Leo,  I am too stranded with the output on the interfaces.

Total output drops, output errors and Excess Defer frames on the sh_controllers output.

I am unable to explain why we have  the errors on certain links.

Ok, now post the output to the command "sh interface ten 1/0/1".

Here's the current output of the show interface ..  

XXXXXX#sh int te1/0/1
TenGigabitEthernet1/0/1 is up, line protocol is up (connected)
Hardware is Ten Gigabit Ethernet, address is 00cc.fc68.f681 (bia 00cc.fc68.f681)
Description: VLAN 599 XXXXXXX
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 139/255, txload 11/255, rxload 10/255
Encapsulation ARPA, loopback not set
Keepalive not set
Full-duplex, 1000Mb/s, link type is auto, media type is 10/100/1000BaseTX SFP
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:28, output never, output hang never
Last clearing of "show interface" counters 1d09h
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 952461192
Queueing strategy: Class-based queueing
Output queue: 0/40 (size/max)
5 minute input rate 42660000 bits/sec, 6693 packets/sec
5 minute output rate 45738000 bits/sec, 6374 packets/sec
339603369 packets input, 252803866753 bytes, 0 no buffer
Received 4431 broadcasts (3978 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 3978 multicast, 0 pause input
0 input packets with dribble condition detected
387892709 packets output, 388077218045 bytes, 0 underruns
952461192 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

I'm getting confused here.  I thought there was another interface involved but it turns out it is the same one. 

Was the patch cable replaced yet?

Hi Leo,  yes, there are 4 links having this symptom. We have ruled out the problem with the patch since it works fine when connected to the old 3750X.

Similar problem here, but Reliability looks solid so far (255/255).  Packet drop ratios seem to be commensurate with traffic volume for affected interfaces, but out overall throughput on any given interface tops out at 25% so we're pretty sure it's not a matter of high traffic specifically.

forget it we see the same bahaviour on several stacks that were upgraded to an higher ios than 3.2.x ... I would be surprised if 100 cables are suddenly defect .. look for similiar postings in the forum ... that's a known issue with 3850 and ios 3.6.x and above

zatchmo04
Level 1
Level 1

Any resolution to these issues?  I am having the same problem.  I began occuring after the upgrade to 03.06.05.E.  I noticed it initially on WAN interfaces that are limited to 10Mb by the WAN provider.  TAC chalked it up to spikey traffic and we played with QoS policies and queue buffer ratios trying to get it the errors down with only intermittent success.

This week I connected two access layer switches (a 2960S and a 2960X) to copper 1Gig interfaces on the 3850.  There is relatively little traffic going across these links (peak of 40Mbps today), but I am getting the same continuous output errors.  It appears to be related to queue thresholds being exceeded, but there are no policies assigned to these interfaces.  So that doesn't really make sense.

See output below:

sh platform qos queue stats gigabitEthernet 1/0/22
DATA Port:16 Enqueue Counters
-------------------------------
Queue Buffers Enqueue-TH0 Enqueue-TH1 Enqueue-TH2
----- ------- ----------- ----------- -----------
    0       0           0     1264043    43993438
    1       0           0           0  5999610234
    2       0           0           0           0
    3       0           0           0           0
    4       0           0           0           0
    5       0           0           0           0
    6       0           0           0           0
    7       0           0           0           0
DATA Port:16 Drop Counters
-------------------------------
Queue Drop-TH0    Drop-TH1    Drop-TH2    SBufDrop    QebDrop
----- ----------- ----------- ----------- ----------- -----------
    0           0           0           0           0           0
    1           0           0     6344068           0           0
    2           0           0           0           0           0
    3           0           0           0           0           0
    4           0           0           0           0           0
    5           0           0           0           0           0
    6           0           0           0           0           0
    7           0           0           0           0           0
 AQM Broadcast Early WTD COUNTERS(In terms of Bytes)
--------------------------------------------------
  PORT TYPE          ENQUEUE             DROP
--------------------------------------------------
 UPLINK PORT-0        N/A               0
 UPLINK PORT-1        N/A               0
 UPLINK PORT-2        N/A               0
 UPLINK PORT-3        N/A               0
 NETWORK PORTS        6331               6331
 RCP PORTS               0                  0
 CPU PORT                0                  0
Note: Queuing stats are in bytes

Same here. I upgraded from 03.02.02 to 03.06.05. There where no issues before. I know for sure because I check every switch monthly. Now since the upgrade the output error counters started to go crazy. No idea why. Network seems to perform just fine.

Review Cisco Networking for a $25 gift card