01-11-2013 09:46 AM - edited 03-07-2019 11:02 AM
Hi,
I've been fighting what seems to be an increased number of outqueue drops on our core stack and edge switches for the last 3 or 4 weeks.
(The core consists of a stack of 5 3750s in 32-gig stack mode. The wkgrp switches are 3560s. all are at 12.2.52)
The wkgrp switches are directly connected to users. We use Nortel IP phones with the phone inline with the user PC.
auto-neg to 100/full
The typical port configuration at the workgroup switch to a user interface is :
switchport mode access
switchport voice vlan 100
priority-queue out
mls qos trust dscp
spanning-tree portfast
end
As a reminder, Nortel phones mark voice traffic dscp 46 and COS 6, sig traffic dscp 40 and cos 5, and data traffic 0/0
I have the following QOS commands implemented:
mls qos map dscp-cos 46 to 6
mls qos map cos-dscp 0 8 16 24 32 40 46 56
mls qos srr-queue input bandwidth 98 2
mls qos srr-queue input buffers 95 5
mls qos srr-queue input priority-queue 2 bandwidth 1
mls qos srr-queue input dscp-map queue 1 threshold 1 40
mls qos queue-set output 1 threshold 1 3100 3100 100 3200
mls qos queue-set output 2 threshold 1 3100 3100 100 3200
mls qos queue-set output 1 buffers 10 75 5 10
mls qos queue-set output 2 buffers 10 75 5 10
However I have tried turning off QOS on a couple of workgroup switches (no mls qos, but left individual port configurations the same) but am still seeing drops.
Since I have disabled qos on the switches in question (no mls qos) (not the core tho) I am presuming these commands have no affect on the switch operation and therefore cannot be related to the problem. Is this correct?
With QOS turned off one would presume that it is general congestion - especially at the user edge where busy PC issues might contribute. So I wanted to see if I could see any instances of packets in the output queues building up.
I wrote some scripts and macros that essentially did a snapshot of 'show int' every 20 seconds or so, and looked for instances of 'Queue: x/' where x was greater than zero.
What I found after several days of watching the core stack, and a few of the workgroup switches that are most often displaying the behavior, was that I NEVER saw ANY packets in output queues. I often saw packets in Input queues for VLAN1, once in a great while I would see packets on input queues for fa\ or Gi\ interfaces, but NEVER on output queues.
Here is an example shot
reference time is D497F931.F38DA0F7 (08:12:01.951 cst Wed Jan 9 2013)
Vlan1 is up, line protocol is up
Input queue: 3/75/15/0 (size/max/drops/flushes); Total output drops: 0
Output queue: 0/40 (size/max)
GigabitEthernet1/0/17 is up, line protocol is up (connected)
Input queue: 2/75/0/0 (size/max/drops/flushes); Total output drops: 175085
Output queue: 0/40 (size/max)
Additionally, when I look (via snmp) at interface utilization on interfaces showing queue drops (both core and wkgroup), they are occurring at ridiculously low utilization levels (as low as 4 to 8%). I've tried to look for microbursts between the core and a wkgroup switch where the core interface was experiencing drops, but haven't seen any (using observer suite).
I also took a look at the supervisor queue and saw this:
sho platform port-asic stats drop
all port drops 0 but supervisor drops:
Supervisor TxQueue Drop Statistics
Queue 0: 0
Queue 1: 0
Queue 2: 0
Queue 3: 56656
Queue 4: 0
Queue 5: 0
Queue 6: 0
Queue 7: 3386
Queue 8: 175865
Queue 9: 0
Queue 10: 4753
Queue 11: 942033
Queue 12: 0
Queue 13: 0
Queue 14: 0
Queue 15: 0 sho platform port-asic stats drop
all port drops 0 but supervisor drops:
Supervisor TxQueue Drop Statistics
Queue 0: 0
Queue 1: 0
Queue 2: 0
Queue 3: 56656
Queue 4: 0
Queue 5: 0
Queue 6: 0
Queue 7: 3386
Queue 8: 175865
Queue 9: 0
Queue 10: 4753
Queue 11: 942033
Queue 12: 0
Queue 13: 0
Queue 14: 0
Queue 15: 0
But I don't know how, or if, this ties in at all.
While the queue-drop counts aren't critically high at this point, they are happening more frequently than in the past and I would like to understand what is going on...
Does anyone have a clue what could be going on? In most cases, no error counters are incrementing for these interfaces.
Is there some mechanism besides conjestion that could cause output queue drops?
Where/at what should I look next?
Thanks for any help you can give!
k
01-12-2013 02:37 PM
Any ideas anyone?
01-14-2013 10:27 AM
I reloaded the core stack this wkend. This has almost - but not entirely - eliminated the drops on the core stack.
I'm not sure how that could affect the situation other than possibly mem fragmentation. I watch the core-switches memory and there wasn't a huge change anywhere (the levels did increase, of course). IO largest contig mem - for instance - went from around 3360-3499k to 3760-3952k between the 5 switches.
Any thoughts there?
The workgroup switches had been previously rebooted so, even if the reboot takes care of the core, I still have those to look into.
01-14-2013 10:36 AM
Here's a look at the controller ASIC stats, in case it helps anyone. I'm not sure what its saying so any interpretations would be appreciated.
===========================================================================
Switch 1, PortASIC 0 Statistics
---------------------------------------------------------------------------
0 RxQ-0, wt-0 enqueue frames 0 RxQ-0, wt-0 drop frames
33957167 RxQ-0, wt-1 enqueue frames 0 RxQ-0, wt-1 drop frames
0 RxQ-0, wt-2 enqueue frames 0 RxQ-0, wt-2 drop frames
0 RxQ-1, wt-0 enqueue frames 0 RxQ-1, wt-0 drop frames
4911947 RxQ-1, wt-1 enqueue frames 0 RxQ-1, wt-1 drop frames
30544174 RxQ-1, wt-2 enqueue frames 0 RxQ-1, wt-2 drop frames
0 RxQ-2, wt-0 enqueue frames 0 RxQ-2, wt-0 drop frames
0 RxQ-2, wt-1 enqueue frames 0 RxQ-2, wt-1 drop frames
486603753 RxQ-2, wt-2 enqueue frames 0 RxQ-2, wt-2 drop frames
0 RxQ-3, wt-0 enqueue frames 0 RxQ-3, wt-0 drop frames
0 RxQ-3, wt-1 enqueue frames 0 RxQ-3, wt-1 drop frames
0 RxQ-3, wt-2 enqueue frames 0 RxQ-3, wt-2 drop frames
10145 TxQueue Drop Stats
0 TxBufferFull Drop Count 0 Rx Fcs Error Frames
0 TxBufferFrameDesc BadCrc16 0 Rx Invalid Oversize Frames
0 TxBuffer Bandwidth Drop Cou 0 Rx Invalid Too Large Frames
0 TxQueue Bandwidth Drop Coun 0 Rx Invalid Too Large Frames
0 TxQueue Missed Drop Statist 0 Rx Invalid Too Small Frames
435374 RxBuffer Drop DestIndex Cou 0 Rx Too Old Frames
0 SneakQueue Drop Count 0 Tx Too Old Frames
0 Learning Queue Overflow Fra 0 System Fcs Error Frames
0 Learning Cam Skip Count
0 Sup Queue 0 Drop Frames 7254 Sup Queue 8 Drop Frames
0 Sup Queue 1 Drop Frames 0 Sup Queue 9 Drop Frames
0 Sup Queue 2 Drop Frames 0 Sup Queue 10 Drop Frames
5260 Sup Queue 3 Drop Frames 0 Sup Queue 11 Drop Frames
0 Sup Queue 4 Drop Frames 0 Sup Queue 12 Drop Frames
0 Sup Queue 5 Drop Frames 0 Sup Queue 13 Drop Frames
0 Sup Queue 6 Drop Frames 0 Sup Queue 14 Drop Frames
0 Sup Queue 7 Drop Frames 0 Sup Queue 15 Drop Frames
01-16-2013 08:32 AM
Have I posted this in the wrong forum? Is there a more appropriate board on this site that I should move this thread to?
thanks
k
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: