12-22-2011 07:02 PM - edited 03-04-2019 02:43 PM
Have a pair of 2960's in a stack, one port(trunk) connects to another DC and we are seeing ~5% packet-loss
and large output drops to this DC.
#sh interfaces gigabitEthernet 1/0/17 counters errors
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
Gi1/0/17 0 0 0 0 0 182867
GigabitEthernet1/0/17 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is a0cf.5b87.ec11 (bia a0cf.5b87.ec11)
Description: QinQ_to_DC2
MTU 1998 bytes, BW 100000 Kbit, DLY 100 usec,
reliability 255/255, txload 41/255, rxload 23/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 6d13h, output 00:00:00, output hang never
Last clearing of "show interface" counters 04:02:15
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 183592
Queueing strategy: fifo
Output queue: 0/40 (size/max)
30 second input rate 9047000 bits/sec, 2075 packets/sec
30 second output rate 16324000 bits/sec, 2309 packets/sec
As you can see, 30sec rate isnt excessive, but as the drops are outdiscards it would appear we are getting hit by the small buffers/microburst issue.
Gi1/0/17 is mapped to asic 0/20
Gi1/0/17 17 17 17 0/20 1 17 17 local Yes Yes
Port-asic Port Drop Statistics - Summary
========================================
Port 20 TxQueue Drop Stats: 308277833
And majority appear to be in Queue 1:
Port 20 TxQueue Drop Statistics
Queue 0
Weight 0 Frames 3
Weight 1 Frames 0
Weight 2 Frames 0
Queue 1
Weight 0 Frames 308240408
Weight 1 Frames 458
Weight 2 Frames 0
Queue 2
Weight 0 Frames 37898
Weight 1 Frames 0
Weight 2 Frames 0
Queue 3
Weight 0 Frames 91
Weight 1 Frames 0
Weight 2 Frames 0
Queue 4
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 5
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 6
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 7
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Done a bit of reasearch, and as we have mls qos configured(we have some ssh/rdp policies in place on access-ports), we need to look at "tweaking" the buffer allocations on the switch to hopefully mitigate(reduce) these drops.
There appears to be a range of recommendations when it comes to these tweaks - Hoping someone has some suggestions on
what to set with "mls qos queue-set output" to alleviate the drops?(start conservative, then apply more aggressive if needed)....and also, does adjusting the buffers require an outage window?
Our traffic is primarily backup(replication which is very bursty), and Internet
Thanks in advance.
12-22-2011 08:07 PM
Hello,
We can see that most traffic being dropped withing queue2.
You can check the following commands to see which DSCP-QOS are matching that queue and possibly then adjust it:
sho mls qos int gi1/0/17 stat
show mls qos maps dscp-cos
sh mls qos maps cos-output-q
also check the current queue allocation with "sh mls qos int gi1/0/17"
and possibly increase queue 2 with:
mls qos queue-set output 1 buffers 10 40 25 25 - possibly then tune it differently based on the new outputs.
You can also change thresholds to:
mls qos queue-set output 1 threshold 2 500 500 50 500
Hope this helps,
Nik
12-22-2011 08:23 PM
Thanks Nik - results of requested sh:
#sho mls qos int gi1/0/17 stat
GigabitEthernet1/0/17 (All statistics are in packets)
dscp: incoming
-------------------------------
0 - 4 : 55800843 0 1 0 38
5 - 9 : 0 0 0 1071079724 0
10 - 14 : 1 0 0 0 0
15 - 19 : 0 10 0 0 0
20 - 24 : 0 0 0 0 2
25 - 29 : 0 1816074149 0 0 0
30 - 34 : 0 0 0 0 0
35 - 39 : 0 0 0 0 0
40 - 44 : 3 0 0 0 0
45 - 49 : 0 0 0 108865 0
50 - 54 : 0 0 0 0 0
55 - 59 : 0 0 0 0 0
60 - 64 : 0 0 0 0
dscp: outgoing
-------------------------------
0 - 4 : 2161609365 4528 2468576816 861 31334116
5 - 9 : 143 500 815 330708481 450
10 - 14 : 15721 0 13672 0 2285
15 - 19 : 2 10955553 128 2739 2
20 - 24 : 35 3 201 27 62641
25 - 29 : 26 37037176 1 46 2
30 - 34 : 3 2 945 1 405
35 - 39 : 1 5 1 34 2
40 - 44 : 194 11 2 2 10140
45 - 49 : 0 222214 1 522172 6
50 - 54 : 17 2 1 1 5
55 - 59 : 2 56 1 36540 3
60 - 64 : 2 3 0 6116
cos: incoming
-------------------------------
0 - 4 : 3748217471 1072163134 10 1816076438 0
5 - 7 : 3 108865 528
cos: outgoing
-------------------------------
0 - 4 : 2422389883 330740734 10958766 37099949 1394
5 - 7 : 232564 522207 840938
output queues enqueued:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 232564 0 0
queue 1: 2756394379 108784 800685
queue 2: 48058748 0 0
queue 3: 566322 0 0
output queues dropped:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 3 0 0
queue 1: 309661435 465 0
queue 2: 37912 0 0
queue 3: 91 0 0
Policer: Inprofile: 0 OutofProfile: 0
#show mls qos maps dscp-cos
Dscp-cos map:
d1 : d2 0 1 2 3 4 5 6 7 8 9
---------------------------------------
0 : 00 00 00 00 00 00 00 00 01 01
1 : 01 01 01 01 01 01 02 02 02 02
2 : 02 02 02 02 03 03 03 03 03 03
3 : 03 03 04 04 04 04 04 04 04 04
4 : 05 05 05 05 05 05 05 05 06 06
5 : 06 06 06 06 06 06 07 07 07 07
6 : 07 07 07 07
#sh mls qos maps cos-output-q
Cos-outputq-threshold map:
cos: 0 1 2 3 4 5 6 7
------------------------------------
queue-threshold: 2-1 2-1 3-1 3-1 4-1 1-1 4-1 4-1
#sh mls qos int gi1/0/17
GigabitEthernet1/0/17
trust state: not trusted
trust mode: not trusted
trust enabled flag: ena
COS override: dis
default COS: 0
DSCP Mutation Map: Default DSCP Mutation Map
Trust device: none
qos mode: port-based
based on the above output, do you still recommend:
"mls qos queue-set output 1 buffers 10 40 25 25"
as a starting point? And will adjusting the buffers require an outage window?
Thanks again for your assistance.
12-22-2011 10:19 PM
Hey,
SO we have this:
output queues enqueued:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 232564 0 0
queue 1: 2756394379 108784 800685
queue 2: 48058748 0 0
queue 3: 566322 0 0
Thsu I would do following buffers:
mls qos queue-set output 1 buffers 10 50 30 10
Regards,
12-22-2011 10:44 PM
Thanks Nik - I had adjusted as per you previous post, and packet-loss actuall increased! then I checked link utilisation, and post your recommeded changes, utilisation had increased to 90Mbit/sec! I rate limited the offending backup traffic, to 30Mbit, and packet loss is gone, and drops have reduced dramatically(We have tried this previously...even rate-limiting the backup traffic to 10Mbit, but the packet-loss/drops only reduced slightly)....so very good outcome! I will apply your suggestion above, and hopefully it improves even more!
Thanks again!
12-22-2011 11:47 PM
Glad it gave some good results from first try. Qos is always subject for fine tuning.
Regards
12-23-2011 01:49 PM
Hi Nik - Just an update, If I rate-limit the backup(replication) traffic to 20Mb, I see no packet loss, but If I increase it to 40Mb, I start to see packet-loss across the link again....do you have any further suggestions on fine tuning the buffers to allow for the backup/replication traffic to run at faster speeds?
Cheers
12-25-2011 07:27 PM
Hi John,
Can you please collect the logs we captured before with 40 MB rate-limiter to see which queue is dropping.
BTW Paolo suggestion below can be also valid - if you don't have traffic of particular priority some times QOS blocked can improve situation. Anyway as we still have some room for buffer tuning - we can try that. So will apreciate if you can attach the commands requested beofre once again.
Nik
12-26-2011 03:55 PM
Thanks Nik - As reqested:
#sho mls qos int gi1/0/17 stat
GigabitEthernet1/0/17 (All statistics are in packets)
dscp: incoming
-------------------------------
0 - 4 : 187717358 0 1 0 38
5 - 9 : 0 0 0 1147706354 0
10 - 14 : 1 0 0 0 0
15 - 19 : 0 11 0 0 0
20 - 24 : 0 0 0 0 2
25 - 29 : 0 1831834072 0 0 0
30 - 34 : 0 0 0 0 0
35 - 39 : 0 0 0 0 0
40 - 44 : 3 0 0 0 0
45 - 49 : 0 0 0 111307 0
50 - 54 : 0 0 0 0 0
55 - 59 : 0 0 0 0 0
60 - 64 : 0 0 0 0
dscp: outgoing
-------------------------------
0 - 4 : 993879429 4583 2486230550 862 32074437
5 - 9 : 146 504 816 338168174 459
10 - 14 : 15764 0 13701 0 2319
15 - 19 : 2 10979560 128 4103 2
20 - 24 : 80 3 201 27 62901
25 - 29 : 26 38571868 1 46 2
30 - 34 : 3 2 946 1 3541
35 - 39 : 1 5 1 34 2
40 - 44 : 194 11 2 2 10141
45 - 49 : 0 222216 1 576619 6
50 - 54 : 17 2 1 1 5
55 - 59 : 2 56 1 36612 3
60 - 64 : 2 3 0 6126
cos: incoming
-------------------------------
0 - 4 : 396051087 1148789637 11 1831834074 0
5 - 7 : 3 111307 550
cos: outgoing
-------------------------------
0 - 4 : 3811637267 338200419 10984104 38634849 4531
5 - 7 : 232567 576653 873865
output queues enqueued:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 232567 0 0
queue 1: 4153234309 114000 833566
queue 2: 49618986 0 0
queue 3: 623987 0 0
output queues dropped:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 3 0 0
queue 1: 317860776 465 53
queue 2: 37913 0 0
queue 3: 124 0 0
Policer: Inprofile: 0 OutofProfile: 0
#show mls qos maps dscp-cos
Dscp-cos map:
d1 : d2 0 1 2 3 4 5 6 7 8 9
---------------------------------------
0 : 00 00 00 00 00 00 00 00 01 01
1 : 01 01 01 01 01 01 02 02 02 02
2 : 02 02 02 02 03 03 03 03 03 03
3 : 03 03 04 04 04 04 04 04 04 04
4 : 05 05 05 05 05 05 05 05 06 06
5 : 06 06 06 06 06 06 07 07 07 07
6 : 07 07 07 07
#sh mls qos maps cos-output-q
Cos-outputq-threshold map:
cos: 0 1 2 3 4 5 6 7
------------------------------------
queue-threshold: 2-1 2-1 3-1 3-1 4-1 1-1 4-1 4-1
#sh mls qos int gi1/0/17
GigabitEthernet1/0/17
trust state: not trusted
trust mode: not trusted
trust enabled flag: ena
COS override: dis
default COS: 0
DSCP Mutation Map: Default DSCP Mutation Map
Trust device: none
qos mode: port-based
#show mls qos maps dscp-cos
Dscp-cos map:
d1 : d2 0 1 2 3 4 5 6 7 8 9
---------------------------------------
0 : 00 00 00 00 00 00 00 00 01 01
1 : 01 01 01 01 01 01 02 02 02 02
2 : 02 02 02 02 03 03 03 03 03 03
3 : 03 03 04 04 04 04 04 04 04 04
4 : 05 05 05 05 05 05 05 05 06 06
5 : 06 06 06 06 06 06 07 07 07 07
6 : 07 07 07 07
#sh mls qos maps cos-output-q
Cos-outputq-threshold map:
cos: 0 1 2 3 4 5 6 7
------------------------------------
queue-threshold: 2-1 2-1 3-1 3-1 4-1 1-1 4-1 4-1
#sh mls qos int gi1/0/17
GigabitEthernet1/0/17
trust state: not trusted
trust mode: not trusted
trust enabled flag: ena
COS override: dis
default COS: 0
DSCP Mutation Map: Default DSCP Mutation Map
Trust device: none
qos mode: port-based #sh mls qos int gi1/0/17
GigabitEthernet1/0/17
trust state: not trusted
trust mode: not trusted
trust enabled flag: ena
COS override: dis
default COS: 0
DSCP Mutation Map: Default DSCP Mutation Map
Trust device: none
qos mode: port-based
We do have qos applied to access ports giving management (ssh+rdp) higher priority...so I would prefer not to remove all as Paolo suggested if possible.
Cheers.
12-26-2011 08:35 PM
Ok
SO we got more traffic in same queue too. I would try following order:
1. Check new bandwidth with following threshold config
mls qos queue-set output 1 threshold 1 3200 3200 100 3200
mls qos queue-set output 1 threshold 2 3200 3200 100 3200
mls qos queue-set output 1 threshold 3 3200 3200 100 3200
2. Then you can further adjust the queues:
mls qos queue-set output 1 buffers 5 60 25 10
or even
mls qos queue-set output 1 buffers 5 65 25 5
and see which queues are dropping with command
sho mls qos int gi1/0/17 stat
3. If you still see drops - you can consider switching off QoS globally and see if FIFO on output improves the situation
Nik
12-26-2011 08:54 PM
Thanks Nik - Adjusted as suggested:
#sh run | include mls
mls qos queue-set output 1 threshold 1 3200 3200 100 3200
mls qos queue-set output 1 threshold 2 3200 3200 100 3200
mls qos queue-set output 1 threshold 3 3200 3200 100 3200
mls qos queue-set output 1 buffers 5 65 25 5
and with 40Mb rate-limit:
#sho mls qos int gi1/0/17 stat
GigabitEthernet1/0/17 (All statistics are in packets)
dscp: incoming
-------------------------------
0 - 4 : 262871790 0 1 0 38
5 - 9 : 0 0 0 1149948698 0
10 - 14 : 1 0 0 0 0
15 - 19 : 0 11 0 0 0
20 - 24 : 0 0 0 0 2
25 - 29 : 0 1832252709 0 0 0
30 - 34 : 0 0 0 0 0
35 - 39 : 0 0 0 0 0
40 - 44 : 3 0 0 0 0
45 - 49 : 0 0 0 111380 0
50 - 54 : 0 0 0 0 0
55 - 59 : 0 0 0 0 0
60 - 64 : 0 0 0 0
dscp: outgoing
-------------------------------
0 - 4 : 1117251944 4583 2486570792 862 32110564
5 - 9 : 146 504 816 338229400 459
10 - 14 : 15764 0 13701 0 2319
15 - 19 : 2 10981358 128 4173 2
20 - 24 : 80 3 201 27 62901
25 - 29 : 26 38640706 1 46 2
30 - 34 : 3 2 946 1 3541
35 - 39 : 1 5 1 34 2
40 - 44 : 194 11 2 2 10141
45 - 49 : 0 222216 1 577210 6
50 - 54 : 17 2 1 1 5
55 - 59 : 2 56 1 36615 3
60 - 64 : 2 3 0 6126
cos: incoming
-------------------------------
0 - 4 : 471244235 1151031981 11 1832252711 0
5 - 7 : 3 111380 551
cos: outgoing
-------------------------------
0 - 4 : 3935400094 338261645 10985972 38703687 4531
5 - 7 : 232567 577244 875689
output queues enqueued:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 232567 0 0
queue 1: 4277066131 114047 835389
queue 2: 49689692 0 0
queue 3: 624581 0 0
output queues dropped:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 3 0 0
queue 1: 319047026 466 61
queue 2: 37915 0 0
queue 3: 124 0 0
Policer: Inprofile: 0 OutofProfile: 0
Cheers
12-27-2011 01:58 AM
SO that still dropping for queue 2. We can increse that for the cost of decreasing queue 3, but it will not improve much and I guess it will start dropping soon.
What is the speed of your int? Can you do "show int gi1/0/7". Can you please also explain how you do rate-limiting?
P.S. did you try a test with disabling of QoS completely?
Nik
12-27-2011 02:39 PM
Hi Nik,
Int speed is 100Mb, and sh int below:
#sh int gigabitEthernet 1/0/17
GigabitEthernet1/0/17 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is a0cf.5b87.ec11 (bia a0cf.5b87.ec11)
Description: QinQ_to_DC2
MTU 1998 bytes, BW 100000 Kbit, DLY 100 usec,
reliability 255/255, txload 119/255, rxload 19/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 1w5d, output 00:00:03, output hang never
Last clearing of "show interface" counters 17:46:12
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 171602
Queueing strategy: fifo
Output queue: 0/40 (size/max)
30 second input rate 7589000 bits/sec, 3497 packets/sec
30 second output rate 46906000 bits/sec, 4617 packets/sec
169405700 packets input, 53082999025 bytes, 0 no buffer
Received 194378 broadcasts (38990 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 38990 multicast, 0 pause input
0 input packets with dribble condition detected
226258242 packets output, 285168510077 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
rate-limit input 40960000 7680000 15360000 conform-action transmit exceed-action drop
rate-limit output 40960000 7680000 15360000 conform-action transmit exceed-action drop
#sh int gigabitEthernet 1/0/17
GigabitEthernet1/0/17 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is a0cf.5b87.ec11 (bia a0cf.5b87.ec11)
Description: QinQ_via_AAPT_to_FUJI
MTU 1998 bytes, BW 100000 Kbit, DLY 100 usec,
reliability 255/255, txload 119/255, rxload 19/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 1w5d, output 00:00:03, output hang never
Last clearing of "show interface" counters 17:46:12
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 171602
Queueing strategy: fifo
Output queue: 0/40 (size/max)
30 second input rate 7589000 bits/sec, 3497 packets/sec
30 second output rate 46906000 bits/sec, 4617 packets/sec
169405700 packets input, 53082999025 bytes, 0 no buffer
Received 194378 broadcasts (38990 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 38990 multicast, 0 pause input
0 input packets with dribble condition detected
226258242 packets output, 285168510077 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
We have multiple vlans running over this link, the backup/replication traffic is one of the vlans, that is in a vrf - I am rate limiting it on the L3 Int..i.e.
rate-limit input 40960000 7680000 15360000 conform-action transmit exceed-action drop
rate-limit output 40960000 7680000 15360000 conform-action transmit exceed-action drop
(This is done on a 7200)
12-28-2011 10:48 PM
Ok so with 100 Mb we can see some bursts causing those output drops - so we can tune further the queue2 - but possibly will get more drops in other classes - but may be that would be fine for your traffic.
Also you can try disabling the qos to see how that works.
Nik
12-30-2011 12:28 PM
Hi Nik (Apologies for the delay in responding - Have been out of office)
If I disable qos (no mls qos), will that be adequate? What happens to the existing buffer+output thresholds(Are they still active, or will they be removed?)...We also have "trust dscp" on some access ports(Will this be retained, or are qos markings simply passed through with mls qos disabled)?
We also have service policy on access-ports giving rdp+ssh higher priorty(marks it as af31)
Cheers
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide