cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
756
Views
2
Helpful
8
Replies

Output drops: Industrial Ethernet 4010 - "Microburst" traffic

Thomas-Ramsey
Level 1
Level 1

Just as an FYI - already working with Cisco TAC on this issue, but looking for any insight others might have if you've ever had to deal with this similar issue: Output Drops (packet loss) due to instantaneous burst traffic.

A client that we're working with uses Industrial Ethernet (IE)4010 switches. The trouble in the network is that we're seeing output drops listed in the "show interface Gi1/<port#>". However, the network itself isn't sustaining anything more then really 50mbps (7/255 on the TxLoad metric) overall. All switch to switch and switch to host connections are 1gig. Since the network isn't being over utilized nor are we trying to fit 1gig of traffic into say a 100meg interface - What seems to be happening is that on a momentary (instantaneous) basis is that more then 1megabit per millisecond (1 gigabit per second) gets pumped in and this seemingly overloads the "store and forward" buffers. (see wireshark image below)

This might be OK if it were TCP traffic as the lost packets would just get retransmitted. But, this is a video monitoring system for the customer and it is UDP traffic. So lost packets just produce short term glitchy or frozen video. The software team is looking at the software controlling the system, but I need to look at making adjustments on the network side, too. But, I'm out of my element here - which is why I've engaged with Cisco TAC. Has anyone else had to deal with this (to me) esoteric issue of "microburst" traffic?

I did find a web article from, presumably a CCIE, titled "Cisco Catalyst 9K Output Drops Quick Fix" where in they suggest using “qos queue-softmax-multiplier <X>” to change the available shared pool of buffer memory. But, unfortunately that command does not exist on the IE4010 (which is running the lastest IOS version - 15.2(8)E5). As a proof of concept, I was able to find a Catalyst 3850, temporarily swap all the connections to it, and implement the "soft max-muliplier" command and this did fully solve the "microburst" traffic problem. But the customer does need the IE4010 switch as it will be utilized in "severe service" location that would be inappropriate for a Catalyst switch.

8 Replies 8

Thomas-Ramsey
Level 1
Level 1

BurstTraffic.png

Bursting traffic on one of the switches interfaces

Joseph W. Doherty
Hall of Fame
Hall of Fame

The cures for microbursting are deeper queues (which you've seen effective on a 3850) and/or dequeuing priority (relative to other traffic which might be the traffic causing transient congestion).

I'm unfamiliar with the IE switches, so without some research, cannot say what their QoS capabilities are.  Possibly something like interface egress queue can be increased with the interface hold-queue command.

You might review your switch's QoS features.  (I might too, but what's the IOS version and feature set?)

Hi Joseph

The IE4010 switches are 12 copper, 18 SFP. And run the current Cisco recommended IOS version 15.2(8)E5. Here is a show version:

 

Cisco IOS Software, IE4010  Software (IE4010-UNIVERSALK9-M), Version 15.2(8)E5, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2023 by Cisco Systems, Inc.
Compiled Wed 08-Nov-23 00:28 by mcpre

ROM: Bootstrap program is IE4010 boot loader
BOOTLDR: IE4010  Boot Loader (IE4010-HBOOT-M) Version 15.2(4r)EC, RELEASE SOFTWARE (fc1)

CAS-NS1 uptime is 2 weeks, 6 days, 18 hours, 12 minutes
System returned to ROM by power-on
System restarted at 23:26:55 UTC Thu May 2 2024
System image file is "sdflash:/ie4010-universalk9-mz.152-8.E5.bin"
Last reload reason: Reload command



This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to
export@cisco.com.

License Level: lanbase
License Type: Permanent Right-To-Use
Next reload license Level: lanbase

cisco IE-4010-16S12P (APM86XXX) processor (revision Z0) with 1048576K bytes of memory.
Processor board ID 
Last reset from power-on
3 Virtual Ethernet interfaces
28 Gigabit Ethernet interfaces
The password-recovery mechanism is enabled.

512K bytes of flash-simulated non-volatile configuration memory.
Base ethernet MAC Address       : CC:36:CF:22:1F:00
Motherboard assembly number     : 73-101623-04
Motherboard serial number       : 
Model revision number           : Z0
Motherboard revision number     : D0
Model number                    : IE-4010-16S12P
System serial number            : 
Top Assembly Part Number        : 68-6048-02
Top Assembly Revision Number    : C0
Version ID                      : V02
Hardware Board Revision Number  : 0x05
Backplane FPGA version          : 1.2B
CIP Serial Number               : 0xF2221F00
SKU Brand Name                  : Cisco
Device Manager Package          : None


Switch Ports Model                     SW Version            SW Image                 
------ ----- -----                     ----------            ----------               
*    1 28    IE-4010-16S12P            15.2(8)E5             IE4010-UNIVERSALK9-M     


Configuration register is 0xF

 

I'm looking at Cisco documentation for the IE4k regarding QoS: Congestion Avoidance and Queuing. But, it seems to me like this is just qualifying / categorizing traffic inside of (or using) the existing interface buffers. Where as that Catalyst command for 'softmax-multiplier' seems to be increasing the interface buffers. I don't know if that's the correct interpretation.

However, seems like maybe the "hold-queue" command does? Is it just maybe an interface level command vs maybe a global level like the "softmax-multiplier"? I'm not familiar with either of these commands, but I do see that the IE4k does have the "hold-queue" command for the interfaces.

I've skimmed the QoS information.  Somewhat similar to QoS on older Catalyst platforms.

Do you have an egress service policy now?  If not:

Default QoS Configuration

There are no policy maps, class maps, table maps, or policers configured. At the egress port, all traffic goes through a single default queue that is given the full operational port bandwidth. The default size of the default queue is 160 (256-byte) packets.

Assuming you don't want to get into multiple classes (4 max) we can define a service policy to increase the queue depth as follows:

policy-map Sample
 class class-default
  queue-limit number-of-packets

For number-of-packets, set the minimum threshold for WTD. The range is from 16 to 544, in multiples of 16, where each packet is a fixed unit of 256 bytes.

Note: For optimal performance, we strongly recommend that you configure the queue-limit to 272 or less.

For egress interface showing drops:

interface GigabitEthernet#
 service-policy output Sample

First try the recommended 272 and if that doesn't work, next try the max value of 544.  If 544 works, you might try working your way back toward 272 until you see drops again.

Good morning Joseph

Thank you for the response. I will try this out this morning and let you know.

Hi did this solution work? I tried applying it to an interface on my switch and got an error message. Which seems a bit limiting

QoS: Configuration failed.  policy-map Queue_Depth has an invalid number of class-maps.  All output policies must have same number of class-maps.

Hi Russell

So this was what I was able to find that I could apply to our IE 4010 switches:

!
policy-map burst-policy
 class class-default
  queue-limit 544 packets
!
interface GigabitEthernet 1/26
 service-policy output burst-policy
!

This particular clients network build got shelved for now. I was moved to a different project. Last I understood - there was an unmanaged switch / media converter in the line that was identified as problem. Due to distance requirements fiber is being run from the SFPs of the 4010 to the end-device(s) location where the fiber is then connected to this small unmanaged 4 port switch (which has a fiber input). If it's swapped out with say a 'managed' switch like an IE 2k the problem is greatly alleviated.

Though, my guess (uneducated as it may be) is that the software portion of the network we're building for the client is buggy and duplicating traffic unnecessarily and needs to be tamed.

Hi

Thanks for the response. I did some more digging and my problem is that whilst it is only 50Mbs trying over a 100Mbs link it all trying to go through one of the 4 egress queues. On the old 3750 and MLS qos you use to be able adjust the buffers on each queue to match your traffic profile. I have raise this with Cisco TAC
Review Cisco Networking for a $25 gift card