11-05-2021 03:59 AM
Hello,
I have an issue with some packet loss between two switches: a 6509 and a 3600. The loss is not a lot, around ~0.2% but it is causing issue with some of our equipment connected to these switches.
The switches connected at 2x 1gbps lacp link with CWDM SFPs.
The loss appears only during peak time, not during night time. The traffic on the link is not saturated, it is max at 600mbps on both ports.
I see some output packet drop on the interfaces, what could be causing this? Any pointers on thing that I should check further?
Thank you in advance for your help.
Kind regards
Solved! Go to Solution.
11-05-2021 09:23 AM
"The traffic on the link is not saturated, it is max at 600mbps on both ports."
Ah, if the dual gig links, peak is always 2 gig. Something like 600 Mbps is a usage average over some time period (often 5 minutes).
Drops happen down at the millisecond level, so "normal" peak usage averages don't show "burst" loading.
What might help you to understand a common cause is reading about "microbursts".
Basically, if your aggregate traffic to the interface can exceeds the interface's bandwidth, you can have drops, even with "low" overall utilization.
If you're wondering if there is anything you can do to mitigate this, beyond providing more bandwidth, the answer is yes - sometimes. For example, you mention having a dual Etherchannel link, is your hashing choice optimal for your traffic, assuming your device can provide an optimal choice for your traffic?
You might (if supported on your device) increase interface buffer/queue resources (which, generally, increases occasionally latency, while decreasing drops).
You might use QoS techniques to "protect" important/fragile traffic, and/or use an early or tier dropping to better manage traffic rates and (sometimes) diminish (not eliminate) overall drops.
11-05-2021 04:29 AM
Couple of the things to chek :
Pysical :
1. check the patch leads
2. try to reseat or relace SFP Module (both the side 1 at a time)
Configuration side :
1. look at the LACP Loadbalance configured
2. see what Interface having any drops (input or output)
show interface gi 0/X (both) if possible post here to understand the issue
simple test :
1. Shutdown one of the interface in port-channel (since your Traffic less than 1GB, you should not see any issue) - if that packet loss gone, you know what to do with port shutdown now (if that is not the case, repeat other one bringing other interface up and shutdown other interface).
Finally post show version code running with below out put here :
1. show version
2. show interface Gi x/x (from bot the switch)
3. how are you pinging inside switch to switch ? what is the source IP and destination IP ? they directly connected to switches ?
11-05-2021 09:46 AM
Hello,
Thank you for your reply. I already tried to shutdown one port to get all the traffic on only one port but the issue remains the same, on either port.
Here is the output of one of the port:
s01gal#sh int g3/24 GigabitEthernet3/24 is up, line protocol is up (connected) Hardware is C6k 1000Mb 802.3, address is 0027.0dcb.2287 (bia 0027.0dcb.2287) Description: --to-S01REG-CWDM-gray MTU 9216 bytes, BW 1000000 Kbit, DLY 10 usec, reliability 255/255, txload 178/255, rxload 10/255 Encapsulation ARPA, loopback not set Keepalive set (10 sec) Full-duplex, 1000Mb/s, media type is CWDM-1470 input flow-control is off, output flow-control is off Clock mode is auto ARP type: ARPA, ARP Timeout 04:00:00 Last input 00:00:18, output 00:00:12, output hang never Last clearing of "show interface" counters 05:54:36 Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 2463168 Queueing strategy: fifo Output queue: 0/40 (size/max)
I don't have any input or output errors but I see the Total output drops increasing quite a lot at peak time. Is it related to some queues?
It is the versions of the switches:
3600#sh ver Cisco IOS Software, ME360x Software (ME360x-UNIVERSALK9-M), Version 15.4(3)S5, RELEASE SOFTWARE (fc1) 6509#sh ver Cisco IOS Software, s72033_rp Software (s72033_rp-ADVIPSERVICESK9-M), Version 15.1(2)SY10, RELEASE SOFTWARE (fc4)
Best
11-05-2021 10:12 AM - edited 11-05-2021 10:25 AM
Looking at your output :
s01gal#sh int g3/24 GigabitEthernet3/24 is up, line protocol is up (connected) Hardware is C6k 1000Mb 802.3, address is 0027.0dcb.2287 (bia 0027.0dcb.2287) Description: --to-S01REG-CWDM-gray MTU 9216 bytes, BW 1000000 Kbit, DLY 10 usec, reliability 255/255, txload 178/255, rxload 10/255 Encapsulation ARPA, loopback not set Keepalive set (10 sec) Full-duplex, 1000Mb/s, media type is CWDM-1470 input flow-control is off, output flow-control is off Clock mode is auto ARP type: ARPA, ARP Timeout 04:00:00 Last input 00:00:18, output 00:00:12, output hang never Last clearing of "show interface" counters 05:54:36 Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 2463168
as mentioned other posters you may have good amount of traffic that is bursting short period of time.
troubleshoot :
https://www.cisco.com/c/en/us/support/docs/routers/10000-series-routers/6343-queue-drops.html
Qos check :
https://www.cisco.com/c/en/us/support/docs/switches/catalyst-6000-series-switches/10587-73.html
11-05-2021 12:25 PM
Thank you for the links, I will have a look. Also looking at the graph I see that one of the port of the LACP is loaded at around 700mbps while the other one is only at 300mbps. Is it possible to even the load across the two ports somehow?
11-06-2021 01:05 AM
Thank you for the links, I will have a look. Also looking at the graph I see that one of the port of the LACP is loaded at around 700mbps while the other one is only at 300mbps. Is it possible to even the load across the two ports somehow?
Configuration side :
1. look at the LACP Loadbalance configured
2. see what Interface having any drops (input or output)
show interface gi 0/X (both) if possible post here to understand the issue
11-05-2021 09:23 AM
"The traffic on the link is not saturated, it is max at 600mbps on both ports."
Ah, if the dual gig links, peak is always 2 gig. Something like 600 Mbps is a usage average over some time period (often 5 minutes).
Drops happen down at the millisecond level, so "normal" peak usage averages don't show "burst" loading.
What might help you to understand a common cause is reading about "microbursts".
Basically, if your aggregate traffic to the interface can exceeds the interface's bandwidth, you can have drops, even with "low" overall utilization.
If you're wondering if there is anything you can do to mitigate this, beyond providing more bandwidth, the answer is yes - sometimes. For example, you mention having a dual Etherchannel link, is your hashing choice optimal for your traffic, assuming your device can provide an optimal choice for your traffic?
You might (if supported on your device) increase interface buffer/queue resources (which, generally, increases occasionally latency, while decreasing drops).
You might use QoS techniques to "protect" important/fragile traffic, and/or use an early or tier dropping to better manage traffic rates and (sometimes) diminish (not eliminate) overall drops.
11-05-2021 10:01 AM
Thank you for your reply. Indeed it could be micro burst as the link is carrying internet traffic of many users. What I am seeing on the interfaces is a total output drops increasing, is this a counter related directly to the queues?
GigabitEthernet3/24 is up, line protocol is up (connected) Hardware is C6k 1000Mb 802.3, address is 0027.0dcb.2287 (bia 0027.0dcb.2287) Description: --to-S01REG-CWDM-gray MTU 9216 bytes, BW 1000000 Kbit, DLY 10 usec, reliability 255/255, txload 188/255, rxload 9/255 Encapsulation ARPA, loopback not set Keepalive set (10 sec) Full-duplex, 1000Mb/s, media type is CWDM-1470 input flow-control is off, output flow-control is off Clock mode is auto ARP type: ARPA, ARP Timeout 04:00:00 Last input 00:00:44, output 00:00:36, output hang never Last clearing of "show interface" counters 06:15:46 Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 2863062 Queueing strategy: fifo
Output queue: 0/40 (size/max)
What is the command to increase output queue? Is the 6509 limited on queue sizes? Would upgrading the hardware could improve the pb? I was looking at the nexus switches for replacement as the 6509 is getting old.
Thank you for your help.
Best
11-05-2021 05:16 PM
"What I am seeing on the interfaces is a total output drops increasing, is this a counter related directly to the queues?"
Generally, yes.
"What is the command to increase output queue? Is the 6509 limited on queue sizes?"
Hmm, don't recall if 6500s, LAN type ports, allow you to modify queue sizes. On those, recall, much is bound up in the line card hardware, and capabilities and capacities depend much on the line card being used and even, on some line cards, the actual ports being used (as generally multiple Ethernet line port cards are supported by different ASICs).
"Would upgrading the hardware could improve the pb?"
Possibly, as possibly might happen with using different ports (on your existing line cards), different type of line cards (this, and/or ports being used, can make a big difference!), for dual classic bus and fabric cards - sometimes primary usage mode selected (i.e. bus or fabric) can make a difference, also with 6500s - possibly whether card has DFC might impact drops (DFCs generally have more to do with forwarding capacity).
"I was looking at the nexus switches for replacement as the 6509 is getting old."
Generally, a Nexus has much better hardware for high capacity than a Catalyst, but often with a reduced feature set.
11-05-2021 10:22 AM
Hello,
do you have any sort of QoS configured on either switch ? Post the full running configs of both devices...
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide