Solved: I'm getting confused here. I - Page 2

Antony Pasteris · ‎05-10-2016

We have put in service a catalyst 3850-12XS running ver 03.07.03E and we have noted that in certain ports, there is high output drops. Auto QOS is configured on this switch and we have tried removing the qos config on the ports having problem but it didn't change anything. From a performance point of view, at the moment the switch is running without problems... there is no network outage.

TenGigabitEthernet1/0/1 is up, line protocol is up (connected)
Hardware is Ten Gigabit Ethernet, address is 00cc.fc68.f681 (bia 00cc.fc68.f681)
Description: VLAN 599 XXXXXXXX
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 128/255, txload 16/255, rxload 10/255
Encapsulation ARPA, loopback not set
Keepalive not set
Full-duplex, 1000Mb/s, link type is auto, media type is 10/100/1000BaseTX SFP
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:19, output never, output hang never
Last clearing of "show interface" counters 10:12:51
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 249067352
Queueing strategy: Class-based queueing
Output queue: 0/40 (size/max)
5 minute input rate 41672000 bits/sec, 6894 packets/sec
5 minute output rate 65358000 bits/sec, 8267 packets/sec
69357766 packets input, 54278801831 bytes, 0 no buffer
Received 1362 broadcasts (1226 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 1226 multicast, 0 pause input
0 input packets with dribble condition detected
97533479 packets output, 108964531997 bytes, 0 underruns
249067352 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

As you can see from the ping below that's sending traffic through the link described above, there seem to be no connectivity issues apart from the fact that the output drops counter indicate something completely different.

XXXXXX#ping 4.2.2.1 repeat 1000 size 100
Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 4.2.2.1, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (1000/1000), round-trip min/avg/max = 10/14/30 ms

Could this be a bug ?

John Vincent · ‎10-30-2016

We see the exact same issue... I can also say it still shows up on version 3.7.4

Leo Laohoo · ‎05-10-2016

but this is not the case since before the installation of the 3850 there was a 3750X that has been running for the last 3 years and there have never been problems with the cabling.

Don't get me wrong here, I'm not saying you aren't being "professional" but this is the inherit "risk" of using copper lines (versus fibre optic). It may have worked in the past but pulling the cable out could've jarred one of the pins loose.

Antony Pasteris · ‎05-10-2016

Sorry Leo .. thats not what I intended , I was just pointing out some of the troubleshooting steps that we carried out :)

The same output errors are seen on fibre optic links.

We have two 3850 in a stack and the problem is seen on the port forwarding most traffic on the etherchannel. For example, the ports te1/0/1 and te2/0/1 are bundled to the same 802.3d etherchannel. Te1/0/1 showed the symptom hence we replaced the SFP but this didn't make a difference. We went ahead and unplugged the link te1/0/1 so that traffic could go through the te2/0/1 which didn't have any drops before. As soon as we did this, we started seeing output drops on the link te2/0/1.

Leo Laohoo · ‎05-10-2016

Sorry Leo .. thats not what I intended , I was just pointing out some of the troubleshooting steps that we carried out :)

Apologies not required. No harm done.

The same output errors are seen on fibre optic links.

What outputs? Post the complete output to the command "sh controller e <BLAH>". I want to see if the errors are incoming or outgoing packets.

Antony Pasteris · ‎05-10-2016

I have uploaded the output of the following command :

XXXXXX#sh controllers ethernet-controller te1/0/1

Leo Laohoo · ‎05-10-2016

As far as I'm aware, I do not see anything wrong with the output. No line errors.

Antony Pasteris · ‎05-10-2016

Hi Leo, I am too stranded with the output on the interfaces.

Total output drops, output errors and Excess Defer frames on the sh_controllers output.

I am unable to explain why we have the errors on certain links.

Leo Laohoo · ‎05-10-2016

Ok, now post the output to the command "sh interface ten 1/0/1".

Antony Pasteris · ‎05-10-2016

Here's the current output of the show interface ..

XXXXXX#sh int te1/0/1
TenGigabitEthernet1/0/1 is up, line protocol is up (connected)
Hardware is Ten Gigabit Ethernet, address is 00cc.fc68.f681 (bia 00cc.fc68.f681)
Description: VLAN 599 XXXXXXX
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 139/255, txload 11/255, rxload 10/255
Encapsulation ARPA, loopback not set
Keepalive not set
Full-duplex, 1000Mb/s, link type is auto, media type is 10/100/1000BaseTX SFP
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:28, output never, output hang never
Last clearing of "show interface" counters 1d09h
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 952461192
Queueing strategy: Class-based queueing
Output queue: 0/40 (size/max)
5 minute input rate 42660000 bits/sec, 6693 packets/sec
5 minute output rate 45738000 bits/sec, 6374 packets/sec
339603369 packets input, 252803866753 bytes, 0 no buffer
Received 4431 broadcasts (3978 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 3978 multicast, 0 pause input
0 input packets with dribble condition detected
387892709 packets output, 388077218045 bytes, 0 underruns
952461192 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

Leo Laohoo · ‎05-10-2016

I'm getting confused here. I thought there was another interface involved but it turns out it is the same one.

Was the patch cable replaced yet?

Antony Pasteris · ‎05-10-2016

Hi Leo, yes, there are 4 links having this symptom. We have ruled out the problem with the patch since it works fine when connected to the old 3750X.

trmoon-19 · ‎08-19-2016

Similar problem here, but Reliability looks solid so far (255/255). Packet drop ratios seem to be commensurate with traffic volume for affected interfaces, but out overall throughput on any given interface tops out at 25% so we're pretty sure it's not a matter of high traffic specifically.

Ralph Laemmermeyer · ‎10-19-2016

forget it we see the same bahaviour on several stacks that were upgraded to an higher ios than 3.2.x ... I would be surprised if 100 cables are suddenly defect .. look for similiar postings in the forum ... that's a known issue with 3850 and ios 3.6.x and above

zatchmo04 · ‎09-27-2016

Any resolution to these issues? I am having the same problem. I began occuring after the upgrade to 03.06.05.E. I noticed it initially on WAN interfaces that are limited to 10Mb by the WAN provider. TAC chalked it up to spikey traffic and we played with QoS policies and queue buffer ratios trying to get it the errors down with only intermittent success.

This week I connected two access layer switches (a 2960S and a 2960X) to copper 1Gig interfaces on the 3850. There is relatively little traffic going across these links (peak of 40Mbps today), but I am getting the same continuous output errors. It appears to be related to queue thresholds being exceeded, but there are no policies assigned to these interfaces. So that doesn't really make sense.

See output below:

sh platform qos queue stats gigabitEthernet 1/0/22
DATA Port:16 Enqueue Counters
-------------------------------
Queue Buffers Enqueue-TH0 Enqueue-TH1 Enqueue-TH2
----- ------- ----------- ----------- -----------
    0       0           0     1264043    43993438
    1       0           0           0 5999610234
    2       0           0           0           0
    3       0           0           0           0
    4       0           0           0           0
    5       0           0           0           0
    6       0           0           0           0
    7       0           0           0           0
DATA Port:16 Drop Counters
-------------------------------
Queue Drop-TH0    Drop-TH1    Drop-TH2    SBufDrop    QebDrop
----- ----------- ----------- ----------- ----------- -----------
    0           0           0           0           0           0
    1           0           0     6344068           0           0
    2           0           0           0           0           0
    3           0           0           0           0           0
    4           0           0           0           0           0
    5           0           0           0           0           0
    6           0           0           0           0           0
    7           0           0           0           0           0
AQM Broadcast Early WTD COUNTERS(In terms of Bytes)
--------------------------------------------------
PORT TYPE          ENQUEUE             DROP
--------------------------------------------------
UPLINK PORT-0        N/A               0
UPLINK PORT-1        N/A               0
UPLINK PORT-2        N/A               0
UPLINK PORT-3        N/A               0
NETWORK PORTS        6331               6331
RCP PORTS               0                  0
CPU PORT                0                  0
Note: Queuing stats are in bytes

2044418Puts · ‎10-11-2016

Same here. I upgraded from 03.02.02 to 03.06.05. There where no issues before. I know for sure because I check every switch monthly. Now since the upgrade the output error counters started to go crazy. No idea why. Network seems to perform just fine.

Catalyst 3850 high Total output drops and output errors