05-24-2024 07:22 AM
Just as an FYI - already working with Cisco TAC on this issue, but looking for any insight others might have if you've ever had to deal with this similar issue: Output Drops (packet loss) due to instantaneous burst traffic.
A client that we're working with uses Industrial Ethernet (IE)4010 switches. The trouble in the network is that we're seeing output drops listed in the "show interface Gi1/<port#>". However, the network itself isn't sustaining anything more then really 50mbps (7/255 on the TxLoad metric) overall. All switch to switch and switch to host connections are 1gig. Since the network isn't being over utilized nor are we trying to fit 1gig of traffic into say a 100meg interface - What seems to be happening is that on a momentary (instantaneous) basis is that more then 1megabit per millisecond (1 gigabit per second) gets pumped in and this seemingly overloads the "store and forward" buffers. (see wireshark image below)
This might be OK if it were TCP traffic as the lost packets would just get retransmitted. But, this is a video monitoring system for the customer and it is UDP traffic. So lost packets just produce short term glitchy or frozen video. The software team is looking at the software controlling the system, but I need to look at making adjustments on the network side, too. But, I'm out of my element here - which is why I've engaged with Cisco TAC. Has anyone else had to deal with this (to me) esoteric issue of "microburst" traffic?
I did find a web article from, presumably a CCIE, titled "Cisco Catalyst 9K Output Drops Quick Fix" where in they suggest using “qos queue-softmax-multiplier <X>” to change the available shared pool of buffer memory. But, unfortunately that command does not exist on the IE4010 (which is running the lastest IOS version - 15.2(8)E5). As a proof of concept, I was able to find a Catalyst 3850, temporarily swap all the connections to it, and implement the "soft max-muliplier" command and this did fully solve the "microburst" traffic problem. But the customer does need the IE4010 switch as it will be utilized in "severe service" location that would be inappropriate for a Catalyst switch.
05-24-2024 07:25 AM
Bursting traffic on one of the switches interfaces
05-24-2024 08:46 AM
The cures for microbursting are deeper queues (which you've seen effective on a 3850) and/or dequeuing priority (relative to other traffic which might be the traffic causing transient congestion).
I'm unfamiliar with the IE switches, so without some research, cannot say what their QoS capabilities are. Possibly something like interface egress queue can be increased with the interface hold-queue command.
You might review your switch's QoS features. (I might too, but what's the IOS version and feature set?)
05-24-2024 01:42 PM
Hi Joseph
The IE4010 switches are 12 copper, 18 SFP. And run the current Cisco recommended IOS version 15.2(8)E5. Here is a show version:
Cisco IOS Software, IE4010 Software (IE4010-UNIVERSALK9-M), Version 15.2(8)E5, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2023 by Cisco Systems, Inc.
Compiled Wed 08-Nov-23 00:28 by mcpre
ROM: Bootstrap program is IE4010 boot loader
BOOTLDR: IE4010 Boot Loader (IE4010-HBOOT-M) Version 15.2(4r)EC, RELEASE SOFTWARE (fc1)
CAS-NS1 uptime is 2 weeks, 6 days, 18 hours, 12 minutes
System returned to ROM by power-on
System restarted at 23:26:55 UTC Thu May 2 2024
System image file is "sdflash:/ie4010-universalk9-mz.152-8.E5.bin"
Last reload reason: Reload command
This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.
A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html
If you require further assistance please contact us by sending email to
export@cisco.com.
License Level: lanbase
License Type: Permanent Right-To-Use
Next reload license Level: lanbase
cisco IE-4010-16S12P (APM86XXX) processor (revision Z0) with 1048576K bytes of memory.
Processor board ID
Last reset from power-on
3 Virtual Ethernet interfaces
28 Gigabit Ethernet interfaces
The password-recovery mechanism is enabled.
512K bytes of flash-simulated non-volatile configuration memory.
Base ethernet MAC Address : CC:36:CF:22:1F:00
Motherboard assembly number : 73-101623-04
Motherboard serial number :
Model revision number : Z0
Motherboard revision number : D0
Model number : IE-4010-16S12P
System serial number :
Top Assembly Part Number : 68-6048-02
Top Assembly Revision Number : C0
Version ID : V02
Hardware Board Revision Number : 0x05
Backplane FPGA version : 1.2B
CIP Serial Number : 0xF2221F00
SKU Brand Name : Cisco
Device Manager Package : None
Switch Ports Model SW Version SW Image
------ ----- ----- ---------- ----------
* 1 28 IE-4010-16S12P 15.2(8)E5 IE4010-UNIVERSALK9-M
Configuration register is 0xF
I'm looking at Cisco documentation for the IE4k regarding QoS: Congestion Avoidance and Queuing. But, it seems to me like this is just qualifying / categorizing traffic inside of (or using) the existing interface buffers. Where as that Catalyst command for 'softmax-multiplier' seems to be increasing the interface buffers. I don't know if that's the correct interpretation.
However, seems like maybe the "hold-queue" command does? Is it just maybe an interface level command vs maybe a global level like the "softmax-multiplier"? I'm not familiar with either of these commands, but I do see that the IE4k does have the "hold-queue" command for the interfaces.
05-24-2024 04:23 PM
I've skimmed the QoS information. Somewhat similar to QoS on older Catalyst platforms.
Do you have an egress service policy now? If not:
Default QoS Configuration
There are no policy maps, class maps, table maps, or policers configured. At the egress port, all traffic goes through a single default queue that is given the full operational port bandwidth. The default size of the default queue is 160 (256-byte) packets.
Assuming you don't want to get into multiple classes (4 max) we can define a service policy to increase the queue depth as follows:
policy-map Sample
class class-default
queue-limit number-of-packets
For number-of-packets, set the minimum threshold for WTD. The range is from 16 to 544, in multiples of 16, where each packet is a fixed unit of 256 bytes.
Note: For optimal performance, we strongly recommend that you configure the queue-limit to 272 or less.
For egress interface showing drops:
interface GigabitEthernet#
service-policy output Sample
First try the recommended 272 and if that doesn't work, next try the max value of 544. If 544 works, you might try working your way back toward 272 until you see drops again.
05-28-2024 06:31 AM
Good morning Joseph
Thank you for the response. I will try this out this morning and let you know.
10-23-2024 03:56 AM
Hi did this solution work? I tried applying it to an interface on my switch and got an error message. Which seems a bit limiting
QoS: Configuration failed. policy-map Queue_Depth has an invalid number of class-maps. All output policies must have same number of class-maps.
10-24-2024 08:53 AM
Hi Russell
So this was what I was able to find that I could apply to our IE 4010 switches:
!
policy-map burst-policy
class class-default
queue-limit 544 packets
!
interface GigabitEthernet 1/26
service-policy output burst-policy
!
This particular clients network build got shelved for now. I was moved to a different project. Last I understood - there was an unmanaged switch / media converter in the line that was identified as problem. Due to distance requirements fiber is being run from the SFPs of the 4010 to the end-device(s) location where the fiber is then connected to this small unmanaged 4 port switch (which has a fiber input). If it's swapped out with say a 'managed' switch like an IE 2k the problem is greatly alleviated.
Though, my guess (uneducated as it may be) is that the software portion of the network we're building for the client is buggy and duplicating traffic unnecessarily and needs to be tamed.
10-24-2024 04:40 PM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide