cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
15456
Views
30
Helpful
16
Replies

3850 Output Queue Drops

Otaku78
Level 1
Level 1

Hi all, I'm experiencing an issue with a WS-C3850-24S switch where a small percentage of packets are dropping from the output queues in about half of the trunks. I've done some research and found that it can be caused by an ingress interface that offers much more bandwidth than the egress interface. I'm not entirely sure this is what's causing it to be honest.

Ingress on this switch is a 2 x 10Gb EtherChannel trunk. Firmware version is 03.06.06E .

I've also tried to use some of the fixes described in these forums in regards to changing the queue parameters but have been unsuccessful in making those changes. I would really appreciate it if someone could help me out!

 

 Trunk Interface stats:

 

 reliability 255/255, txload 4/255, rxload 1/255

 Encapsulation ARPA, loopback not set
  Keepalive not set
  Full-duplex, 1000Mb/s, link type is auto, media type is 1000BaseBX-10U SFP
  input flow-control is off, output flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:18, output never, output hang never
  Last clearing of "show interface" counters 3w5d
  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 959239
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  30 second input rate 6068000 bits/sec, 2450 packets/sec
  30 second output rate 18567000 bits/sec, 1655 packets/sec
     147799652 packets input, 78574317037 bytes, 0 no buffer
     Received 405149 broadcasts (136421 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 136421 multicast, 0 pause input
     0 input packets with dribble condition detected

 237563053 packets output, 252406070491 bytes, 0 underruns
 959239 output errors, 0 collisions, 0 interface resets

 

QoS Queue stats:

-------------------------------
Queue Buffers Enqueue-TH0 Enqueue-TH1 Enqueue-TH2
----- ------- ----------- ----------- -----------
    0       0           0           0  1192754861
    1       0           0           0   664465155
    2       0           0           0           0
    3       0           0           0           0
    4       0           0           0           0
    5       0           0           0           0
    6       0           0           0           0
    7       0           0           0           0
DATA Port:21 Drop Counters
-------------------------------
Queue Drop-TH0    Drop-TH1    Drop-TH2    SBufDrop    QebDrop
----- ----------- ----------- ----------- ----------- -----------
    0           0           0           0           0           0
    1           0           0     3619155           0           0
    2           0           0           0           0           0
    3           0           0           0           0           0
    4           0           0           0           0           0
    5           0           0           0           0           0
    6           0           0           0           0           0
    7           0           0           0           0           0
 AQM Broadcast Early WTD COUNTERS(In terms of Bytes)
--------------------------------------------------
  PORT TYPE          ENQUEUE             DROP
--------------------------------------------------
 UPLINK PORT-0        N/A               0
 UPLINK PORT-1        N/A               0
 UPLINK PORT-2        N/A               0
 UPLINK PORT-3        N/A               0
 NETWORK PORTS    21024980          140441674
 RCP PORTS               0                  0
 CPU PORT                0                  0
Note: Queuing stats are in bytes

1 Accepted Solution

Accepted Solutions

I have seen this at a couple of our customers after they upgrade to these new switches. If you do a wireshark capture if you see 2 or more ARP packets right in a row it could be this bug.

3850 duplicates pass-through ARP packets

Bug ID: CSCur30273

Description

Symptom:
When ARP request/Reply packets enter an access or trunk interface and its L2 switched to an another access or trunk interface in same VLAN the ARP request/Reply packets gets duplicated. This means that when one ARP packet enter the switch two identical ARP packets exit the switch. We have seen IP Packets getting duplicated. This only seems to affect ARP packets.

Conditions:
L2 switching with lanbase license

Workaround:
The issue is only observed with "lanbase" license. Issue is not seen with "ipbase" license.

Enable IPDT

3850-STACK#sh ip device tracking all

Global IP Device Tracking for clients = Disabled >>>>> IPDT is disabled by default on lanbase
-----------------------------------------------------------------------------------------------
IP Address MAC Address Vlan Interface Probe-Timeout State Source
-----------------------------------------------------------------------------------------------

3850-STACK(config)#int range gig1/0/19 , gig2/0/39

3850-STACK(config-if)#ip device tracking maximum ?
<0-65535> Maximum devices (0 means disabled)

3850-STACK(config-if-range)#ip device tracking maximum 20
3850-STACK(config-if-range)#end
3850-STACK#sh ip device tracking all
Global IP Device Tracking for clients = Enabled >>>>> Make sure IPDT is enabled
Global IP Device Tracking Probe Count = 3
Global IP Device Tracking Probe Interval = 30
Global IP Device Tracking Probe Delay Interval = 0
-----------------------------------------------------------------------------------------------
IP Address MAC Address Vlan Interface Probe-Timeout State Source
-----------------------------------------------------------------------------------------------

Total number interfaces enabled: 2
Enabled interfaces:
Gi1/0/19, Gi2/0/39
Further Problem Description:
none

 

Unicast ARP packets are duplicated

Bug ID: CSCuv78424

Description

Symptom:
3650/3850 duplicates unicast ARP request packets destined for its IP and sends back 2 replies for one ARP request packet sent by the host.

Conditions:
NA

Workaround:
NA

Further Problem Description:

Customer Visible

Was the description about this Bug Helpful?

(0)

Details

Last Modified: Jun 15,2016

Status: Fixed

Severity: 2 Severe

Product: Cisco Catalyst 3850 Series Switches

 

Cisco Catalyst 3850 Series Switches

Support Cases:

2

Known Affected Releases: 15.2(3)E

 

Known Fixed Releases:

 

15.2(2)E4

15.2(2)E5

15.2(3)E3

16.1(1.15)

16.1.2

16.2(0.151)

3.6(4)E

3.6(5)E

3.7(3)E

Denali-16.1.2

View solution in original post

16 Replies 16

Francesco Molino
VIP Alumni
VIP Alumni
Hi

Can you share your config and a quick sketch to see where this interface having drops is connected to?


Thanks
Francesco
PS: Please don't forget to rate and select as validated answer if this answered your question

Hi Francesco hopefully this image can shed some light on the topology.

Config the Trunk interfaces linking the 3850 to the access closet switch are quite simple:

switchport trunk native vlan x
 switchport trunk allowed vlan x,x,x,x
 switchport mode trunk
 switchport nonegotiate

 

 

you could be overutilizing it , spiking the interface at times flooding the buffer causing drops to increment

theres a specific Cisco tshoot doc for output drops on 3850s it may help you identify and fix the issue , but you have a bit of a bottleneck there too coming down to a 1gb from 4x1gbs and 20gb link behind it

https://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/200594-Catalyst-3850-Troubleshooting-Output-dr.html

I don't know if you applied any QoS, however based on your design, you should have bottleneck (agree with Mark).

what's the utilization of 20G links?

Thanks
Francesco
PS: Please don't forget to rate and select as validated answer if this answered your question

Thanks Mark and Francesco that makes total sense I thought that was the case.

I haven't applied any QoS in my network yet, it's all best effort.

I'm not using VOIP or video conferencing so it's not absolutely critical at this stage but at some point but I do need to start classifying and marking the more important traffic between some servers and the client workstations.

Would it be essential that I start applying QoS on my traffic or simply change the queue buffers for best effort?

Sorry QoS is not my specialty so I apologise if I sound silly here.

Here is the utilisation on the 2 x 10Gb EtherChannel links.

 

(sh controllers util)

Port       Receive Utilization  Transmit Utilization

Te1/1/3            0                    0
Te1/1/4            0                    0

It always looks like this when I'm viewing it so I'm assuming it's just the occasional bursty traffic that's causing the output drops on the egress port. I do have a 30 second load-interval set on the Portchannel interface that manages the links.

 

Tx and Rx Load always look very low.

 

TenGigabitEthernet1/1/3 is up, line protocol is up (connected)
  Hardware is Ten Gigabit Ethernet, address is 547c.6966.ae9f (bia 547c.6966.ae9f)
  MTU 1500 bytes, BW 10000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive not set
  Full-duplex, 10Gb/s, link type is auto, media type is SFP-10GBase-CX1
  input flow-control is off, output flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:01, output never, output hang never
  Last clearing of "show interface" counters 4w6d
  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 2353000 bits/sec, 602 packets/sec
  5 minute output rate 15850000 bits/sec, 1623 packets/sec
     3975312442 packets input, 5107283679684 bytes, 0 no buffer
     Received 34726967 broadcasts (27553546 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 27553546 multicast, 0 pause input
     0 input packets with dribble condition detected
     5482773919 packets output, 6821538322629 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

There s away to track burst traffic through wireshark to confirm if it is that flooding the buffer temporarily

https://www.cisco.com/c/en/us/support/docs/lan-switching/switched-port-analyzer-span/116260-technote-wireshark-00.html

Joseph W. Doherty
Hall of Fame
Hall of Fame
I'm unfamiliar with the 3650/3850 queuing architecture, but those stats do appear to show many drops for Q1 at threshold 1.

If this is caused by sustained congestion, often not much you can do on a switch to drop packets "smarter". If this is cause of transient congestion, often increasing queue sizes will decrease drop rate.

I have seen this at a couple of our customers after they upgrade to these new switches. If you do a wireshark capture if you see 2 or more ARP packets right in a row it could be this bug.

3850 duplicates pass-through ARP packets

Bug ID: CSCur30273

Description

Symptom:
When ARP request/Reply packets enter an access or trunk interface and its L2 switched to an another access or trunk interface in same VLAN the ARP request/Reply packets gets duplicated. This means that when one ARP packet enter the switch two identical ARP packets exit the switch. We have seen IP Packets getting duplicated. This only seems to affect ARP packets.

Conditions:
L2 switching with lanbase license

Workaround:
The issue is only observed with "lanbase" license. Issue is not seen with "ipbase" license.

Enable IPDT

3850-STACK#sh ip device tracking all

Global IP Device Tracking for clients = Disabled >>>>> IPDT is disabled by default on lanbase
-----------------------------------------------------------------------------------------------
IP Address MAC Address Vlan Interface Probe-Timeout State Source
-----------------------------------------------------------------------------------------------

3850-STACK(config)#int range gig1/0/19 , gig2/0/39

3850-STACK(config-if)#ip device tracking maximum ?
<0-65535> Maximum devices (0 means disabled)

3850-STACK(config-if-range)#ip device tracking maximum 20
3850-STACK(config-if-range)#end
3850-STACK#sh ip device tracking all
Global IP Device Tracking for clients = Enabled >>>>> Make sure IPDT is enabled
Global IP Device Tracking Probe Count = 3
Global IP Device Tracking Probe Interval = 30
Global IP Device Tracking Probe Delay Interval = 0
-----------------------------------------------------------------------------------------------
IP Address MAC Address Vlan Interface Probe-Timeout State Source
-----------------------------------------------------------------------------------------------

Total number interfaces enabled: 2
Enabled interfaces:
Gi1/0/19, Gi2/0/39
Further Problem Description:
none

 

Unicast ARP packets are duplicated

Bug ID: CSCuv78424

Description

Symptom:
3650/3850 duplicates unicast ARP request packets destined for its IP and sends back 2 replies for one ARP request packet sent by the host.

Conditions:
NA

Workaround:
NA

Further Problem Description:

Customer Visible

Was the description about this Bug Helpful?

(0)

Details

Last Modified: Jun 15,2016

Status: Fixed

Severity: 2 Severe

Product: Cisco Catalyst 3850 Series Switches

 

Cisco Catalyst 3850 Series Switches

Support Cases:

2

Known Affected Releases: 15.2(3)E

 

Known Fixed Releases:

 

15.2(2)E4

15.2(2)E5

15.2(3)E3

16.1(1.15)

16.1.2

16.2(0.151)

3.6(4)E

3.6(5)E

3.7(3)E

Denali-16.1.2

Apologies for the late reply but you were right burleyman it was a bug related issue. I upgraded to a new IOS release of 03.06.08.E.152-2.E8 and all of the output queues now report no packet drops after a 48 hour period. I'm confident that the issue has been resolved.

 

Thanks to all you awesome people for your help!

Jacqueline_2016
Level 1
Level 1

I have this problem too.

Please, how did you fixed it?

 

Thank you

Jackie

Hi Jackie This issue was related to the IOS version we were using and the error counters were not actually reflecting anything in reality, it was all bug related.

I simply upgraded to a later IOS version and the problem went away. Currently using image version 03.06.08E, an older image but very stable for us.


Hi Okatu,

 

Thank you so much for your reply.

I am running 03.07.02E 

And still getting the same error. I am worry because I have so many 3850 almost in the entire network.
Cisco IOS Software, IOS-XE Software, Catalyst L3 Switch Software (CAT3K_CAA-UNIVERSALK9-M), Version 03.07.02E RELEASE SOFTWARE (fc1)

 

Jackie

If it's not a bug, as noted in other posts, it might just be due to congestion on the port.
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card