cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2693
Views
0
Helpful
5
Replies

ESW540 Performance Degradation to 100Mb when using LACP/Etherchannels on 1Gb Ports

tim
Level 1
Level 1

I have three ESW-540-24 10/100/1000 Switches in a small school environment

1. ESW performs as a server switch for out small cluster of VMWare ESXi Hosts and iSCSI SAN with a link-aggregation/lacp/etherchannel connection to the backbone switch, and a Link-Aggregated Connection to the thrid ESW switch via a multimode optic fibre link to a near-site backup and DR location

2. The second ESW acting as network backbone links back to the server switch and our older LinksysSRW224G4 (four SRW224G4s) switches using Aggregated Links / LACP to reduce bandwidth contention and allow for link redundancy

3. The third ESW as mentioned previously is at the backup DR location linked back to Switch 1

When using Single 1GB links between these three switches I can almost saturate the 1GB link (80-95% utilisation) as soon as link aggregation is configured by bonding 2x 1GB links together to form an etherchannel link utilisation will not exceed 100MB (network monitor graph on a server/workstation runs flat at 10% utilisation) I have tested this multiple times useing large file transfers accross our SANs (which have high enough throughput to saturate links) and can confirm that performance degradation occurs as soon as an etherchannel is configured on the same ports (regardless of manually setting admin speed and duplex of copper ports etc) all indicators specify that ports are running at 1GB even though throughput REDUCES by 90%

This indicates to me a major issue with the etherchannel configuration, or more likely a bug within the switch firmware. any advise would be appreciated.

We are not running the latest  firmware yet (2.0.3), however I have read the release notes for newer versions (2.1.16 and 2.1.19) and there is no indication of a fix for etherchannel/lacp performance issues.

5 Replies 5

David Hornstein
Level 7
Level 7

Tim,

But I must be admit I would be looking at the ;

  • what does the error  log show ?
  • initial setup of cabling remembering there are four combo ports for SFP on the 24 port model.

      (Combo SFP slots  shared copper/SFP a common  port number with only 1 port active at a time. )

  • rmon and etherlike statistics for the interfaces to check for a broadcast storm or multicast storm
  • state of Spanning tree on the interfaces, are there any blocking ports?
  • check the configuration of the LAG interfaces and see if LACP is checked.
  • Hopefully you have set  port speeds back to auto negotiation
  • might be worthwhile using a disruptive tool, after hours   to check the wireing infrastructure as per screen capture below. It used Time domain reflectometry to check cable quality, but it does kill the link when it checks cables, it's a reasonable test of the cable plant...

But,  I would be updating the firmware as a first step, as it's the easiest thing to do (after hours).

Means that if development has to look at the switch firmware,  development is looking at current firmware not legacy firmware.

regards Dave


OK - First things first - a pain in the neck but all switches now on the latest firmware!

David - In answer to the points you have asked me to clarify, I have already considered all of them before posting here but here your points with relevant comments:

  • what does the error  log show ?

NOTHING (i.e. error free operation other than system restarts Link up/Link Down Messages)

Regarding Spanning Tree:

Switch 2 is Spanning Tree Root Bridge (configured with the lowest bridge priority to ensure this, there are no spanning tree issues, there are only 1GB link paths between switchess also, and as mentioned I have set the admin speed so there is ZERO chances of a root-bridge path occuring either through a 100Mb port or a 1GB port operating at 100Mb

  • initial setup of cabling remembering there are four combo ports for SFP on the 24 port model.

      (Combo SFP slots  shared copper/SFP a common  port number with only 1 port active at a time. )

Understood this is not an issue ports 11,12,23,24 are common with SFPs and cannot be used for copper OR SFP but not both.

  • rmon and etherlike statistics for the interfaces to check for a broadcast storm or multicast storm

Presumably this would be due to a spanning tree / switching loop, I understand spanning tree and have it configured and working well, this is not a problem

In addition to this I would point out that link aggregation has been active for probably 12 months or more on this network and nobody noticed that throughput was bottlenecking until I did some isolated tests of throughput accross switches and identified a definate issue (it was continuing complaints about average network performance, that caused this to be investigated)

If spanning tree was a problem the network would pretty much cease functioning within minutes/hours, and any other broadcast storm would be an issue regardless of link aggregation or not (we have experience this inadvertatly in the past don't worry )

  • state of Spanning tree on the interfaces, are there any blocking ports?

No Blocking ports - Correct root bridge on each switch as well

  • check the configuration of the LAG interfaces and see if LACP is checked.

Yes - Using LACP

  • Hopefully you have set  port speeds back to auto negotiation

Well this isn't very important IMHO but I'll do it of you like, I only set them manually to see if it made a difference (which it didn't - and regardless, ports are linking at 1Gb anyway)

  • might be worthwhile using a disruptive tool, after hours    to check the wireing infrastructure as per screen capture below. It  used Time domain reflectometry to check cable quality, but it does kill  the link when it checks cables, it's a reasonable test of the cable  plant..

I have seen this tool before, in the most simple demonstration of this issue switches are side by side - and yes cables have been replaced (they are only 1M cables at the longest), cables can be eliminated from the equation - besides this issues occurs on ALL AGGREGATED LINKS (Including links to the older Linksys SRW224G4 switches which also support LACP)

Sorry for the long post I think that answers your most important points and hopefully you can see I have given the issue appropriate consideration before posting again

What do we do from here??!! I'm sure that you must be able to replicate this issue in a lab with 2 switches and 2 laptops and you will get the same results - am happy to securely forward you my switch configs as well if it helps

tim
Level 1
Level 1

Is anyone in Cisco looking into this or able to comment?

Cheers

Tim

Hi Tim

I cannot duplicate your issue, as I don't have identical models to the switches you are using.

I humbly suggest you approach the small business support center, and allow the folks there to try to simulate your issue or even better, allow  them to Webex in and see the setup, never know they might spot something that you missed.

It is a weird symptom slower performance with link aggregation when compared to a single link.

http://www.cisco.com/en/US/support/tsd_cisco_small_business_support_center_contacts.html

Dave

Hi all,

While I thought I was the only one having this issue, I stumbled on this post.

I am having exactly the same problems with my ESW540-8p connected to a Synology NAS.

On one NIC it goes up to over 100MB/sec, with LACP, not more then 10MB over 2 Nics. (After restart)

Running latest firmware, checked everything..

Did this ever get solved??

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Switch products supported in this community
Cisco Business Product Family
  • CBS110
  • CBS220
  • CBS250
  • CBS350
Cisco Switching Product Family
  • 110
  • 200
  • 220
  • 250
  • 300
  • 350
  • 350X
  • 550X