Solved: interface Output drops. Lan users complain about latency.

Marc_AMI1 · ‎05-10-2016

Hello Guys,

Linux Users connected to my Access user switch are complaining about latency.

I Checked the sw interfaces : Does any one now why do I have output drops pleaaase ?

	User int speed configuration on PC	sw Interfaces
127 input errors, 120 CRC, output drops: 6560435	1Gb	G4/0/35
65 input errors, 59 CRC, output drops: 4118716	1Gb	G4/0/31
53 input errors, 49 CRC, output drops 2072336	1Gb	G4/0/28
output drops 94122	1Gb	G3/0/15
output drops: 2539	1Gb	G4/0/30
148 input errors, 137 CRC, output drops 29049107	1Gb	G3/0/21
output drops: 3923988	1Gb	G2/0/24
output drops: 302659	1Gb	G1/0/9
Disabled	1Gb	G3/0/44
output drops: 2536	1Gb	G3/0/14
output drops: 1350036	1Gb	G1/0/22
output drops: 9428	1Gb	G2/0/7
output drops: 74435	1Gb	G2/0/22
output drops: 54846	1Gb	G3/0/9
output drops: 5092185	1Gb	G1/0/2
output drops: 159657	1Gb	G2/0/15
output drops: 630711	1Gb	G2/0/34
Not connected	1Gb	G4/0/38
72 input errors, 56 CRC, output drops: 10383796	1Gb	G4/0/37
output drops: 1413181	1Gb	G4/0/39
output drops: 58514	1Gb	G1/0/26
output drops: 237431	1Gb	G2/0/33
9162 input errors, 8676 CRC, output drops 82234343	1Gb	G1/0/39
925 input errors, 862 CRC, output drops 1846412	1Gb	G1/0/35

Mark Malone · ‎05-10-2016

Are those input drops/crcs incrementing constantly on each interface above that has them?

when were they last cleared ?

The crcs with inputs could mean a bad cable or nic its layer 1 issue but you need to check if theses are constantly going up in real time , if there not then that's not the issue

View solution in original post

Mark Malone · ‎05-10-2016

Are those input drops/crcs incrementing constantly on each interface above that has them?

when were they last cleared ?

The crcs with inputs could mean a bad cable or nic its layer 1 issue but you need to check if theses are constantly going up in real time , if there not then that's not the issue

Marc_AMI1 · ‎05-10-2016

Yes cleared it frequently. And i get 500 output drops per our :(

Mark Malone · ‎05-10-2016

The output drops are completely different from crcs and input errors there layer 1 problems , if you see these errors increasing after clearing the counters you need to check the cabling , either replace it or check if there are issues with it using command below , fix the issue and retest

test cable-diagnostics tdr interface gigabitEthernet x/x

show cable-diagnostics tdr

Output drops can be from spiked traffic micro bursts of traffic caused by certain applications or from overutilization of the interface itself the buffer being filled as too much traffic for it to handle , are you running qos on these interfaces

When there complaining about latency have you checked to what destination there going and run a traceroute to it from there pc and see if you see any delays in the hops of the path to the destination

I would be fixing the interfaces with incrementing crcs and inputs before looking into output drops

Marc_AMI1 · ‎05-10-2016

Hi mark,

Thaks a lot for your answers.

What do you mean by : I would be fixing the interfaces with incrementing crcs and inputs before looking into output drops.

As i dont see any CRC since i cleared the interfaces counters

Mark Malone · ‎05-10-2016

Ok well that's good if you cleared the counters and there not there anymore that means your layer 1 is ok as that would cause issues for the end users more than output drops would.

Have you calculated the amount of drops per volume of traffic in the interfaces , if its less than 1% its ok but if its more than that you will need to try and see whats causing it

can you take one of the worst interfaces and post the full show interface gx/x and also the show run int gx/x off that interface

from that we might be able to see a bit more if its over utilized etc

As well you need to confirm there is actual latency from the user pc as well and if its just with certain destinations there trying to get to , certain applications they use or is it with every time there on the network or is it only LAN or just when crossing WAN

Are these interfaces constantly running hot at on the RX/TX , are they hammering the ports etc , these are just some things to check as even if there is output drops it shouldn't really be causing them noticeable latency unless the switch is completely oversubscribed , try find the common problem between the users

You have multiple modules there its hard to think they would all be oversubscribed at access layer , the issue may not be the output drops

Marc_AMI1 · ‎05-10-2016

Thanks a lot for the help.

SW-FRPAU-X00I1B-AUSR#sh int g3/0/21
GigabitEthernet3/0/21 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is e05f.b9f8.4a15 (bia e05f.b9f8.4a15)
Description: *** STATION LINUX ***
MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
     reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 1w6d, output 00:00:19, output hang never
Last clearing of "show interface" counters 03:39:25
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 86274
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 51000 bits/sec, 48 packets/sec
5 minute output rate 1225000 bits/sec, 92 packets/sec
     10479475 packets input, 905676618 bytes, 0 no buffer
     Received 0 broadcasts (0 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 0 multicast, 0 pause input
     0 input packets with dribble condition detected
     18115443 packets output, 23193919694 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out

sh run :

interface GigabitEthernet3/0/21
description *** STATION LINUX ***
switchport access vlan 50
switchport mode access
switchport voice vlan 850
ip arp inspection trust
ip arp inspection limit rate 100
no logging event link-status
no logging event power-inline-status
speed auto 100 1000
srr-queue bandwidth share 12 8 30 50
queue-set 2
priority-queue out
no snmp trap link-status
storm-control broadcast level 50.00
spanning-tree portfast
spanning-tree bpduguard enable
ip dhcp snooping limit rate 100
ip dhcp snooping trust
end

Mark Malone · ‎05-10-2016

Way too many drops in 3hrs uptime , Looks like you running qos and your running priority queuing which means everything in priority queue will be serviced first in qos but you have bound the qos to the queue set 2 instead of default queue 1 in the module and then given a custom queue set in srr , im guessing you did not originally set this up as this looks like it was done for a reason if they applied queue set 2 to the interface and given specific custom weightings

These drops could be from the way that's setup , your telling interface only care about priority queue first so drops will occur due to this

: Priority queue is serviced until empty before the other queues are serviced in qos

Qos can be responsible for drops depending on the way its configured , sometimes interfaces experience a lot less drops without qos at layer 2

srr-queue bandwidth share 12 8 30 50
queue-set 2
priority-queue out

You need to check are these drops been seen in qos in the interface , is MLS qos enabled globally post the below off that interface if it is and we can check is qos seeing drops

sh mls qos interface gigabitEthernet x/x statistics

There are a couple of things you can try to fix this

1 Manipulate the srr queues so that any queue that's seeing major drops is given more available percentage. the ssr adds up top 100% and is divided between the queues in the interface 12 8 30 50 = 100 , and it must always equal 100 but you can alter them say if all the drops were in queueu 4 but none in queue 3 as an example you could change it too 12 8 20 60 = 100

2 Remove qos and just use the mls qos trust so there is no queue prioritisation and the voice is still trusted as phone will mark at the source

3 Remove just the priority statement to stop the queue starvation

Its all about getting the right fit for your traffic and environment when using layer 2 qos

These are options only don't change everything at once take a test interface make a few changes and when you get it right with 1 roll it out to the rest

As well if your using different modules in the switch each have different queuing capabilities like below example this can also guide in what the module hardware is capable of

#sh int g1/4/1 capabilities
GigabitEthernet1/4/1
Model:                 NO IDPROM
Type:                  unknown (4)
Speed:                 1000
Duplex:                full
Trunk encap. type:     802.1Q
Trunk mode:            on,off,desirable,nonegotiate
Channel:               yes
Broadcast suppression: percentage(0-100)
Flowcontrol:           rx-(off,on,desired),tx-(off,on,desired)
Membership:            static
Fast Start:            yes
QOS scheduling:        rx-(1q8t), tx-(1p3q8t)
QOS queueing mode:     rx-(cos), tx-(cos)
CoS rewrite:           yes
ToS rewrite:           yes
Inline power:          no
Inline power policing: no
SPAN:                  source/destination
UDLD                   yes
Link Debounce:         yes
Link Debounce Time:    yes
Ports-in-ASIC (Sub-port ASIC) : UNAVAILABLEnum_bus = 0, num_ports = 0
Remote switch uplink: no
Port-Security:         yes
Dot1x:                 yes

Marc_AMI1 · ‎05-10-2016

Hey Marc,

I thought you said that less than 1% drops per volume of traffic in the interface is ok.

because with 86274 drops for 18115443 ouput traffic is < 1%.

Please correct me if i am wrong :)

I attached a show mls qos screen shot.

Many thanks again

Mark Malone · ‎05-10-2016

Yes that's correct less than 1% is fine but we need to make sure the correct packets are being dropped when qos is in place it changes things as it introduces queuing in buffers, dropping 1% of priority traffic is not good but if it’s being dropped in the correct queues then you’re ok

Looking at that MLS output it looks good, the drops are in q4 (I know it says 3 its just the way you count them 0-3 4^th queue)

That’s the queue you have given the most bandwidth too 50 which is usually least prioritized queue in your setup so that’s where we would expect to see a few drops as could be bulk traffic etc

You could increase the queue but you risk effecting your other queues which have more prioritised DSCP values like EF and AF traffic

Again you could remove qos as a test and see if it resolves their latency issue but only a test interface / module

I would start looking at why the users specifically in more details are facing latency as said earlier , is there a common issue , destination / app etc that there using that they see a problem, is the issue there at layer 3 or just at the application layer etc

Check what type of applications there using locally on the pcs check if there known for burst traffic which can cause these output drops as well or if the issue is nothing to do with this switch either and is to do with the path they take to their destination

Marc_AMI1 · ‎05-13-2016

Hello Mark,

I just find out that users ports change their speed to 10mb/s some times.

the variatin of speed on interfaces please ?

Many thanks,

Mark Malone · ‎05-16-2016

Hey Paulinho

If I understand you right and there changing there nics to 10mb and you have the user port to only work with 100 1000 above your going to introduce crc errors as the speed settings will be different on both sides

you need to set your port to speed auto 10 100 1000

Marc_AMI1 · ‎05-16-2016

Hi Marc,

Is there any way to log those speed variations so i can see them when i perform a show logg ?

Many thanks again,

Mark Malone · ‎05-16-2016

If there setting the speed and its not supported on the interface it should kick them out go up/down , if the duplex mis-match the error will show in logs anyway as a problem

Marc_AMI1 · ‎05-16-2016

I see what you mean.

But is there a way to see in log when a interface change from 1000 to 100 for instance ?

Thanks again for your help