Output packet drops at the 3850

AllertGen · ‎04-02-2017

Hello, everyone.

Does anyone can help with troubleshooting problem with output packet drops at the 3850 switch? The problem is that I see a lot of packet drops at the some interfaces at the low traffic rate (maximum is 100 Mbits at the 1G ports).

GigabitEthernet1/0/26 is up, line protocol is up (connected) 
  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 165898
  Output queue: 0/40 (size/max)
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     165898 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
GigabitEthernet2/0/26 is up, line protocol is up (connected) 
  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 18264
  Output queue: 0/40 (size/max)
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     18264 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops

interface Port-channel36
 switchport
 switchport access vlan 1
 switchport trunk native vlan 1
 switchport private-vlan trunk encapsulation dot1q
 switchport private-vlan trunk native vlan tag
 switchport mode trunk
 no switchport nonegotiate
 no switchport protected
 no switchport block multicast
 no switchport block unicast
 no ip arp inspection trust
 ip arp inspection limit rate 15 burst interval 1
 ip arp inspection limit rate 15
 load-interval 300
 carrier-delay 2
 no shutdown
 ipv6 mld snooping tcn flood
 ipv6 mfib forwarding input
 ipv6 mfib forwarding output
 ipv6 mfib cef input
 ipv6 mfib cef output
 snmp trap mac-notification change added
 snmp trap mac-notification change removed
 snmp trap link-status
 arp arpa
 arp timeout 14400
 spanning-tree guard root
 spanning-tree port-priority 128
 spanning-tree cost 0
 hold-queue 2000 in
 hold-queue 40 out
 ip igmp snooping tcn flood
 no ip dhcp snooping information option allow-untrusted
 no bgp-policy accounting input
 no bgp-policy accounting output
 no bgp-policy accounting input source
 no bgp-policy accounting output source
 no bgp-policy source ip-prec-map
 no bgp-policy source ip-qos-map
 no bgp-policy destination ip-prec-map
 no bgp-policy destination ip-qos-map

interface GigabitEthernet1/0/26
 switchport
 switchport access vlan 1
 switchport private-vlan trunk encapsulation dot1q
 switchport private-vlan trunk native vlan tag
 switchport mode trunk
 no switchport nonegotiate
 no switchport protected
 no switchport block multicast
 no switchport block unicast
 no ip arp inspection trust
 ip arp inspection limit rate 15 burst interval 1
 ip arp inspection limit rate 15
 load-interval 300
 carrier-delay 2
 no shutdown
 ipv6 mld snooping tcn flood
 ipv6 mfib forwarding input
 ipv6 mfib forwarding output
 ipv6 mfib cef input
 ipv6 mfib cef output
 snmp trap mac-notification change added
 snmp trap mac-notification change removed
 snmp trap link-status
 cts role-based enforcement
 cdp tlv location
 cdp tlv server-location 
 cdp tlv app
 arp arpa
 arp timeout 14400
 channel-group 36 mode active
 spanning-tree guard root
 spanning-tree port-priority 128
 spanning-tree cost 0
 hold-queue 2000 in
 hold-queue 40 out
 ip igmp snooping tcn flood
 no ip dhcp snooping information option allow-untrusted
 no bgp-policy accounting input
 no bgp-policy accounting output
 no bgp-policy accounting input source
 no bgp-policy accounting output source
 no bgp-policy source ip-prec-map
 no bgp-policy source ip-qos-map
 no bgp-policy destination ip-prec-map
 no bgp-policy destination ip-qos-map

interface GigabitEthernet2/0/26
 switchport
 switchport access vlan 1
 switchport private-vlan trunk encapsulation dot1q
 switchport private-vlan trunk native vlan tag
 switchport mode trunk
 no switchport nonegotiate
 no switchport protected
 no switchport block multicast
 no switchport block unicast
 no ip arp inspection trust
 ip arp inspection limit rate 15 burst interval 1
 ip arp inspection limit rate 15
 load-interval 300
 carrier-delay 2
 no shutdown
 ipv6 mld snooping tcn flood
 ipv6 mfib forwarding input
 ipv6 mfib forwarding output
 ipv6 mfib cef input
 ipv6 mfib cef output
 snmp trap mac-notification change added
 snmp trap mac-notification change removed
 snmp trap link-status
 cts role-based enforcement
 cdp tlv location
 cdp tlv server-location 
 cdp tlv app
 arp arpa
 arp timeout 14400
 channel-group 36 mode active
 spanning-tree guard root
 spanning-tree port-priority 128
 spanning-tree cost 0
 hold-queue 2000 in
 hold-queue 40 out
 ip igmp snooping tcn flood
 no ip dhcp snooping information option allow-untrusted
 no bgp-policy accounting input
 no bgp-policy accounting output
 no bgp-policy accounting input source
 no bgp-policy accounting output source
 no bgp-policy source ip-prec-map
 no bgp-policy source ip-qos-map
 no bgp-policy destination ip-prec-map
 no bgp-policy destination ip-qos-map

The queue size at the other side is 75. So there shouldn't be any problems. Also I didn't have this problem before IOS update (previous version was 03.02.03 and current one is 03.06.06)

Best Regards.

AllertGen · ‎04-02-2017

Drop rate 200k packets at the 10 Mbit. It's a lot and I don't know why it is so high.

Could it be connected with vlans? This ports works at the trunk mode. But at the swith on the other side there is no some of the vlans that presents at the current device.

AllertGen · ‎04-03-2017

Looks like problem not linked with vlans. I see the same problem at ports in access mode too.

So, any ideas what it could be?

AllertGen · ‎04-03-2017

Ok. My resaults so far:

Looks like problem connected with QoS. Seems like Cisco changed queues settings at the newest versions of IOS. As I see the second queue (data traffic) at the port is dropping traffic:

#show platform qos queue stats gi 1/0/26

DATA Port:20 Drop Counters              
-------------------------------              
Queue Drop-TH0    Drop-TH1    Drop-TH2    SBufDrop    QebDrop              
----- ----------- ----------- ----------- ----------- -----------
    0           0           0           0           0           0
    1           0           0    31764522           0           0

As solution I found this lines:

qos queue-softmax-multiplier 1200

ip access-list extended allTraffic
 remark --- ACL for matching all traffic ---
 permit ip any any
exit

class-map match-any cDefQos
 match access-group name allTraffic
exit

policy-map pDesQos
 class class-default
  bandwidth percent 100
 exit
exit

int [name]
 service-policy output pDesQos
exit

But so far first line

qos queue-softmax-multiplier 1200

was anought for me. After this line I don't see errors at the interfaces anymore.

I'm still monitoring ports. But there is no problems so far.

Best Regards.

AllertGen · ‎04-04-2017

I solved problem with all interfaces except interfaces Gi1/0/26 and gi2/0/26. Counters rising only at this 2 interfaces. I placed service-policing to this interfaces but it din't changed anything.

Does anyone have any ideas? I'm bad at QoS questions...

Best Regards.

~chris · ‎04-04-2017

Hi,

seeing output-drops is a known problem of Catalyst 3850 Switches. Cisco published a PDF where the issue is descriped.

http://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/200594-Catalyst-3850-Troubleshooting-Output-dr.html

The drops occur regarding micro-bursts which the switchport cannot buffer (the soft-buffer for queue 1 is too small).

QoS can help to reduce the drops but will not fix the problem at all. Maybe you still see drops, but in another queue.

To increase the soft-buffer of the ports, you can use the "qos queue-soft-multiplier".

Since all ports of the Catalyst 3850 use a shared-buffer on as needed-basis, I thought it could be a problem to change the soft-buffer value to the maximum of 1200.

Few days ago I got the answer by the TAC, that this woud have no negativ impact. A value if 1200 is OK.

If using releases prior to 3.6.6 or 3.7.5 you have to attach a service-policy to increase the soft-buffer. Since 3.6.6 or 3.7.5 increasing the buffer takes affect without a service-policy.

regards

~chris

AllertGen · ‎04-05-2017

Hi, Chris.

Thanks for you reply.

I did saw this document. But it says this:

Output drops are generally a result of interface oversubscription caused by many to one or a 10gig to 1gig transfer.

But in my case I can see 150k packets drop rate at the 50 Mbit speed of traffic (at the 1G ports in full duplex). At such speed buffer shouldn't be used at all.

And as you told I can see drops at the another queue. Before placing service-policy to the interface I saw drops at the queue 1. But after placing policy (with only default class to 100% bandwith) now I'm is seeing drops at the queue 0.

So should I add one more class with "priority level 0 percent ..."?

Best Regards.

~chris · ‎04-12-2017

Hi,

depending on your service-policy, you will disable queue 1 and all traffic flows into queue 0. That's why you see dropped traffic in that queue now.

Did you also change the soft-buffer to a higher value?

What version of IOS-XE you are running on the box? 3.6.6 / 3.7.5 ?

regards

~chris

AllertGen · ‎04-12-2017

Hi, Cris.

The IOS version is 03.06.06.

Current changes of QoS is:

qos queue-softmax-multiplier 1200

ip access-list extended allTraffic
 remark --- ACL for matching all traffic ---
 permit ip any any

class-map match-any cDefQos
 match access-group name allTraffic

policy-map pDesQos
 class class-default
  bandwidth percent 100

int [name]
 service-policy output pDesQos

Best Regards.

~chris · ‎04-12-2017

Hi,

yes, looks good.

But if using 3.6.6 I would like to prefer using no service-policy but increasing the soft-buffer to the maximum value of 1200.

Why? So, as I wrote above, when using this service-policy, than queue 1 is disabled and all traffic (control-traffic and normal traffic) is combined in queue 0. There will be no default "classification".

Whatever you use, do you see still drops?

regards

~chris

AllertGen · ‎04-19-2017

Hi, c.edel.

Sorry for a long responce. I see drops in both cases: when I'm using service-policy and when I'm not. But when I'm using it I see drops at the queue 0 and when I'm not using it I see drops at queue 1.

Best Regard.

divanko · ‎01-19-2018

3. If you define only 1 class-default, in order to tweak the buffer, all the traffic falls under the single queue (including control packets). Be advised that when all traffic is put in one queue, there is no classification between control and data traffic and during time of congestion, control traffic could get dropped. So, it is recommended to create at least 1 other class for control traffic. CPU generated control-packets will always go to the first priority queue even if not matched in the class-map. If there is no priority queue configured, it would go to the first queue of the interface, which is queue-0.

Source: https://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/200594-Catalyst-3850-Troubleshooting-Output-dr.html

Spiky.Steve · ‎10-02-2018

did you add this to both ends of the etherchannel or just on the 3850 end?

c.church · ‎11-03-2022

I'm curious if anyone has run into this issue with more modern IOS such as 16.x? We were seeing heavy drops on a 4 port 1G etherchannel, less than 30 megabit out per member. We applied qos queue-softmax-multiplier 1200 to global config, and the drops decreased dramatically. The Po int went from 24 million drops every half hour down to a million every day. The lack of any 'show platform qos ....' commands makes it difficult in this IOS. I've yet to find the replacement syntax if there is one.

Thanks,

Chuck

divanko · ‎11-03-2022

Chuck,

Did you find this information by reading this document:

https://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/200594-Catalyst-3850-Troubleshooting-Output-dr.html

?

Thanks,

Dallas