12-20-2022 04:23 AM - edited 12-21-2022 06:38 AM
Hello,
i have packet drops on one interface of a port channel group, int gig 0/23 and is increamenting around 160k drops per day, the other port in the channel group has 111 packet drops and is not increamenting.
after check, i found that the drops is on queue 3 weight 2 , even though QOS is disabled
please check below output and appreciate it if you can guide me through.
# show version
Cisco IOS Software, C2960X Software (C2960X-UNIVERSALK9-M), Version 15.2(4)E, RELEASE SOFTWARE (fc2)
ROM: Bootstrap program is C2960X boot loader
BOOTLDR: C2960X Boot Loader (C2960X-HBOOT-M) Version 15.2(3r)E1, RELEASE SOFTWARE (fc1)
uptime is 4 years, 16 weeks, 1 day, 22 hours, 25 minutes
System returned to ROM by power-on
System restarted at 13:04:19 GMT Wed Aug 29 2018
System image file is "flash:c2960x-universalk9-mz.152-4.E.bin"
Last reload reason: Reload command
#show mls qos
QoS is disabled
QoS ip packet dscp rewrite is enabled
#show run int po1
Building configuration...
interface Port-channel1
switchport trunk allowed vlan 1100,1101,1103-1107,1200-1202
switchport trunk native vlan 2
switchport mode trunk
end
#show run int gig0/23
Building configuration...
Current configuration : 355 bytes
!
interface GigabitEthernet0/23
switchport trunk allowed vlan 1100,1101,1103-1107,1200-1202
switchport trunk native vlan 2
switchport mode trunk
spanning-tree vlan 8-9 port-priority 64
channel-group 1 mode active
end
below #sh int gig 0/23
#show int gig 0/23
GigabitEthernet0/23 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is 007e.
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 36/255, rxload 31/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:03, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 159696953
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 124513000 bits/sec, 19988 packets/sec
5 minute output rate 142845000 bits/sec, 24041 packets/sec
2063825130141 packets input, 1341566991758969 bytes, 0 no buffer
Received 5761233 broadcasts (4532790 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 4532790 multicast, 0 pause input
0 input packets with dribble condition detected
2785837489335 packets output, 1971269995213059 bytes, 0 underruns
0 output errors, 0 collisions, 2 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
#show flowcontrol interface gig0/23
Port Send FlowControl Receive FlowControl RxPause TxPause
admin oper admin oper
--------- -------- -------- -------- -------- ------- -------
Gi0/23 Unsupp. Unsupp. off off 0 0
below #show platform port-asic stats drop gig 0/23
Interface Gi0/23 TxQueue Drop Statistics
Queue 0
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 1
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 2
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 3
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 159696953
#show mls qos int gig 0/23 statistics
GigabitEthernet0/23 (All statistics are in packets)
dscp: incoming
-------------------------------
0 - 4 : 4001630853 0 5515379 0 0
5 - 9 : 0 0 0 7640568 0
10 - 14 : 41920357 0 1725454 0 0
15 - 19 : 0 0 0 3396507891 0
20 - 24 : 1852321904 0 0 0 758787218
25 - 29 : 0 2519401224 0 74898 0
30 - 34 : 0 0 559 0 2272418407
35 - 39 : 0 17 0 0 0
40 - 44 : 24 0 0 0 0
45 - 49 : 0 3520959767 0 218587 0
50 - 54 : 290694 0 0 0 0
55 - 59 : 0 63 0 0 0
60 - 64 : 0 0 0 0
dscp: outgoing
-------------------------------
0 - 4 : 773854433 0 17394 0 3847289
5 - 9 : 0 0 0 12 0
10 - 14 : 425271 0 200641 0 0
15 - 19 : 0 0 0 3488667492 0
20 - 24 : 2373898109 0 0 0 1327676910
25 - 29 : 0 4128164582 0 76635 0
30 - 34 : 0 0 0 0 4238127952
35 - 39 : 0 0 0 0 0
40 - 44 : 0 0 0 0 0
45 - 49 : 0 2414919368 0 1960067 0
50 - 54 : 1599670 0 6495 0 220
55 - 59 : 0 0 0 0 0
60 - 64 : 0 0 0 0
cos: incoming
-------------------------------
0 - 4 : 2249208254 0 0 271887 0
5 - 7 : 0 0 0
cos: outgoing
-------------------------------
0 - 4 : 2427650520 5634 0 140402 5560
5 - 7 : 265952604 17627 13992536
output queues enqueued:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 0 0 0
queue 1: 0 2 617598217
queue 2: 0 0 0
queue 3: 0 0 2090166818
output queues dropped:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 0 0 0
queue 1: 0 0 0
queue 2: 0 0 0
queue 3: 0 0 159696953
#show buffers failures
Caller Pool Size When
any idea what should be done to mitigate his issue?
Thanks in advance.
12-20-2022 08:24 AM
Ok, drop stats are incrementing, what about the enqueued stats, any of those incrementing too? Reason I ask, I recall 1) often ASIC stats only clear with a device reload and 2) wonder whether QoS was ever enabled.
That aside, assuming stats are valid, i.e. not some stats bug, you want to mitigate drops, correct?
Three possible approaches come to mind.
First, since the stats show so huge a drop imbalance between your two Etherchannel links, is your hashing algorithm "optimal" for your traffic? I.e. what does the load usage stats looks like between the two Etherchannel links? Perhaps another load balancing algorithm would better load share your links, reducing overall drops.
Second, perhaps you would benefit from having more bandwidth, either via adding one or more links to your Etherchannel, or, if possible, moving this Etherchannel to 10g.
Third, I believe/recall the Catalyst 2960 series uses the same basic architecture as the 3560/3750 switch, but with even less hardware buffer resources. Possibly, enabling QoS, which allows (egress) buffer tuning, might mitigate the issue. (BTW, buffer tuning might be the best solution for transient congestion while additional bandwidth is better for sustained congestion, assuming you want to support that. [I.e. for the latter, another QoS option, again with QoS enabled, is to "target" low-priority bandwidth hogs for packet dropping while avoiding dropping other traffic's packets.])
12-20-2022 09:20 AM
the SW have HW queue, you can not change it add or remove.
so Queue 3 (4 when count the default) is for the Bulk data,
bulk data usually is TCP traffic, so there is server that connect and use port-channel to receive the data.
but only one port member is face this issue of high bulk data ?
Yes it can be if you use hashing depend on Scr./Des. only or Scr./Des.&DesL4Port.
what can I do ??
try change hash you use in port-channel, you must include Src.L4Port
12-20-2022 11:02 AM
Thanks for the reply
i have one more observation, these drops are increasing during peak hours only.
enqueue stats are increasing too
#show platform port-asic stats enqueue gigabitEthernet 0/23
Interface Gi0/23 TxQueue Enqueue Statistics
Queue 0
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 1
Weight 0 Frames 0
Weight 1 Frames 2
Weight 2 Frames 617723187
Queue 2
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 3
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 2569887685
changing the load-balancing algorithm sounds like a good idea, but unfortunately this is the options i have on this switch
#port-channel load-balance ?
dst-ip Dst IP Addr
dst-mac Dst Mac Addr
src-dst-ip Src XOR Dst IP Addr
src-dst-mac Src XOR Dst Mac Addr
src-ip Src IP Addr
src-mac Src Mac Addr
which one do you think would fit more ?
12-20-2022 11:21 AM
If not already default, src-dst-ip often a good, or sometimes best, choice.
All enqueued stats increasing or just for one queue?
12-20-2022 11:44 AM
Scr-dst-ip is good.
try hope it can solve your issue.
12-20-2022 11:48 AM
Hi Joseph,
is this the output you assking for ?
#show mls qos int gig 0/23 statistics
GigabitEthernet0/23 (All statistics are in packets)
dscp: incoming
-------------------------------
0 - 4 : 29145066 0 5515379 0 0
5 - 9 : 0 0 0 7647993 0
10 - 14 : 41920396 0 1725457 0 0
15 - 19 : 0 0 0 3613777183 0
20 - 24 : 1852855384 0 0 0 758805003
25 - 29 : 0 2520514230 0 74898 0
30 - 34 : 0 0 569 0 2273284132
35 - 39 : 0 17 0 0 0
40 - 44 : 24 0 0 0 0
45 - 49 : 0 3521148308 0 218587 0
50 - 54 : 290694 0 0 0 0
55 - 59 : 0 63 0 0 0
60 - 64 : 0 0 0 0
dscp: outgoing
-------------------------------
0 - 4 : 1209767715 0 17394 0 3847289
5 - 9 : 0 0 0 12 0
10 - 14 : 425271 0 200641 0 0
15 - 19 : 0 0 0 3603096075 0
20 - 24 : 2373898228 0 0 0 1327698423
25 - 29 : 0 4128287611 0 76635 0
30 - 34 : 0 0 0 0 4240964660
35 - 39 : 0 0 0 0 0
40 - 44 : 0 0 0 0 0
45 - 49 : 0 2415292802 0 1960078 0
50 - 54 : 1600004 0 6495 0 220
55 - 59 : 0 0 0 0 0
60 - 64 : 0 0 0 0
cos: incoming
-------------------------------
0 - 4 : 2791705020 0 0 271963 0
5 - 7 : 0 0 0
cos: outgoing
-------------------------------
0 - 4 : 2981483878 5634 0 140402 5560
5 - 7 : 265981428 17627 13992536
output queues enqueued:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 0 0 0
queue 1: 0 2 617738946
queue 2: 0 0 0
queue 3: 0 0 2643888275
output queues dropped:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 0 0 0
queue 1: 0 0 0
queue 2: 0 0 0
queue 3: 0 0 159696953 <-- this value is identical to packet drops on the interface
#show int gig 0/23 | inc output drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 159696953
12-21-2022 02:17 PM
Yes.
output queues enqueued:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 0 0 0
queue 1: 0 2 617738946
queue 2: 0 0 0
queue 3: 0 0 2643888275
Are both of these counters increasing?
12-21-2022 04:15 AM
Thank you @Joseph W. Doherty @MHM Cisco World
i will change the load balancing algorithm to src-dst-ip and let you know whether it solved the issue or not.
01-12-2023 05:00 AM
Hello,
there is still packet drops after changing load balancing algorithm.
Queue 3
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 2600962860
any idea what else could be the reason?
01-12-2023 08:11 AM
"any idea what else could be the reason?"
Sure, too much traffic being sent to an egress port in too short a time interval.
Understand, changing the LB algorithm attempts to obtain the best possible load sharing, for your traffic, but this is not guaranteed. Also, even with perfect LB, you may still be oversubscribed and have drops.
If a link/port is (bandwidth) oversubscribed, your mitigations include providing more bandwidth (e.g. add more links to your Etherchannel and/or just to the next available bandwidth tier [e.g. 100 Mbps => gig => 10g, etc.]), using QoS to better manage your traffic's bandwidth demand (e.g. buffer management, drop management, bandwidth management, etc.), and/or using some form of even better load balancing (e.g. MLPPP [usually not fast enough to use with high speed Ethernet], PfR [requires L3 links], etc.).
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide