Re: SG300-28P (SRW2024-K9) in L3 mode. Intermittent loss of spee

sveinskogen · ‎01-25-2011

Every "once in a while" (about a week with my network load), the SG300 crawls to an almost standstill of all IPv4 activity (haven't got IPv6 enabled here yet, so I can't comment wether this applies to IPv6 as well). It seems pure L2 transmissions is not affected, but L3 definitely is (down to 20mbit as opposed to the 600+ rate I usually get).

Soft-rebooting the switch (via the webgui) helps, and the intervals of the crashes seem related to total bytes transmitted, not time. I have jumbo frames enabled, and disabled flow control on on the switch and all hosts.

Firmware Version is 1.0.0.27, md5 1987292110f5657e74308dde30c03dc4
Boot Version is 1.0.0.4 md5 4c9a0b6a9f1346736646d08ab94ae2ac

//Svein

David Carr · ‎02-22-2011

Svein,

Are you connecting multiple switches to this device or is it alone with firewall and end-users?

If your using it with other switches, is STP enabled?

When the network goes to a crawl, what symptoms do you see, does it look like there is an enormous amount of activity on the ports leds?

sveinskogen · ‎02-22-2011

RSTP enabled, connected via single link to one other switch (single 1Gb/Sec cable), this switch is primarily used for the server segment, with one workstation connected + SA540 firewall.

No led-christmastree (no indication of network-loop or packet storm). Switch just gradually lose speed. Applies to both L2 and L3 traffic. My personal guess would be some sort of memory leak, but without knowing the internal workings of the ASICs used, this is just guesswork.

//Svein

David Carr · ‎02-22-2011

Svein,

You read my mind about broadcast storm possibly.

One thing you could try is setup vlan mirroring and capture vlan traffic to see what is going on through the network.

This will allow you to see if something behind the scene is going on possibly killing the network.

Are you on the latest firmware for the switch?

sveinskogen · ‎02-22-2011

Been there, done that. Nothing of interest found, only an increasing latency.

As for the firmware ... There is more than one available? (Running 1.0.0.27)

//Svein

David Carr · ‎02-23-2011

Hey Svein,

When the network moves to a crawl, have you looked at the rmon port statistics and seen any collisions or anything out of the ordinary?

sveinskogen · ‎02-23-2011

Nothing to find there (actually I did that the second time I experienced it). The _ONLY_ thing I have found through normal debugging, is that the l2 latency of the unit gradually increases (which of course affects L3 as well). It's almost as if the unit has some sort of packet-count-induced latency. Alas the unit lacks proper IOS for debugging.

//Svein

Ziggy2011 · ‎02-24-2011

Svein hi,

Degragation of traffic rate can be caused by many issues. Wold it be possible to provide some additional information which may allow to zoom on the root cause?

Some points which may help to focus on cause:

1) Device configuration- VLANs and other features (QoS, rate limit etc).

2) Initially you mention that latency happens to IPv4 traffic but then you mention that latency happens also with L2 traffic. Is the latter point of information the update after your examination?

3) Does the device run in Layer 3 mode? Did you have the oppertunity to check it in L2 mode (if relevant for your network)

4) What was the main indication that traffic rate dropped?

5) I did not understand if the SG300-28P has only 1 station connected to it or the other switch. Can the general NW topology be provided? Many times it is not a specific device issue - but rather a network issue (maybe latency is related to the other device).

6) If there are more than 1 devices connected to SG300-28P switch - does latency happen also between 2 stations connected to it, or only when passing via the other switch

7) Is there any pattern in when traffic drops? After a certain time? Other?

8) Does it happen on any specific ports, or all?

9) Does it happen to a specific types of traffic (TCP/UDP etc)?

sveinskogen · ‎02-24-2011

1 No rate limits are set. QoS set to DSCP. Switch is not saturated with traffic.

2 Latency is seemingly in layer2 forwarding (which of course also affects layer3 traffic). Basically the delay between ethernet frames starts increasing after a while (knock on wood, the switch has behaved properly for a week now)

3 It runs in layer 3 mode, with some pure access interfaces (in /30 netmasks), and some vlan L3 interfaces to do inter-vlan routing. I cannot place the switch into layer2 mode without doing major network redesign.

4 I mostly notice it on the 8 interfaces handling L2 only iSCSI traffic (2 targets with 2 nics each, two vmware ESXi4.1 with two vmk-bound interfaces each), but it's visible on the entire switch

5 I may have been a little vague here: 1 workstation, (the others are on different switches, separated by some distance), the rest of the switch is for servers and the bordergw/firewall (SA540)

6 See 4 (interfaces 3,4 5,6 7,8 9,10 are the iSCSI traffic interfaces)

7 If there is a pattern, I don't see it.

8 layer2 issue.

I suspect the layer3 problems are only a symptom of the layer2 issues.

//Svein

Ziggy2011 · ‎03-01-2011

Hi Svein,

Of course I am hoping you will not have additional trouble.. but just in case maybe if it does occur you can check if you may have oversubscription of traffic (or in other words congestion on output ports)?

one way to check this is to monitor dropps on egress ports via queue statistics (screen Quality of Service --> QoS Statistics --> Queue Statistics - and then define a coutner for the relevant port on all queues and DPs.

Another thing you may check if the switch is not learning MAC addresses for some reason - which may cause flooding. check this in the Mac Addres Tables --> Dynamic Address screen.

sveinskogen · ‎03-01-2011

No congestion, and no mac-adress trouble. No overloads of any kind, that's what's worrying me so much. Just a steady increase in latency as time passes. I had to hard-boot it less than 5 minutes ago because things were getting intolerable again.

//Svein

rmanthey · ‎03-01-2011

I mostly notice it on the 8 interfaces handling L2 only iSCSI traffic (2 targets with 2 nics each, two vmware ESXi4.1 with two vmk-bound interfaces each), but it's visible on the entire switch

Are the 2 nics each bonded? What mode are they using round robin / load balance, or failover? Same with the VMWare are they bonded, and how are they configured? I have seen latency before with different brand switches communicating to with devices that have multiple connections to the network on the same subnet, especially with virtual switches in ESXI, and HyperV. From what I have seen the best configuration is to see if the servers will do, LAG or Etherchannel, LACP on the SG300, 802.3ad. Also set up all servers for STP to forward statically, by setting the Edge Port to enable. You may have to look at your server specs to see if configuring LACP is an option. I am guessing you are not able to test for any length of time only on 1 port per server.

sveinskogen · ‎03-01-2011

No bundling of any kind involved. Running two separate IPs (on different subnets) per iSCSI target unit (connected to separate vlans), each of the vmware esxi-connected NICs go to a separate vSwitch, with two VMK interfaces for each vswitch (one for each of the iSCSI vlans), so LACP/LAG is not the issue here (guess three times if that setup is based on previous encounters with lack-of-lacp stability).

//Svein

alec · ‎04-10-2012

Over the last 18 months, we also have been struggling with workstations, slowing to a crawl on all of our 5 switches. All switches still currently running 1.0.0.27.

Reboot switch and all is well.

Disabling flow control on all workstations seems to have reduced the frequency of rebooting the switches. Jumbo frames enabled or disabled didn't make a difference.

Over the next few days, will upgrade to 1.1.2.0.

FWIW Small Business Switch Firmware updates.

http://www.cisco.com/cisco/software/release.html?mdfid=283019617&release=1.1.2.0&relind=AVAILABLE&flowid=18905&softwareid=282463181&rellifecycle=&reltype=latest

thenick38 · ‎05-30-2014

I know this is an topic but I am curious if the upgrade resolved the issue. I have a SG300-20 that seems to be having the same issue. I reboot and everything is good although I am running version 1.1.1.8

Thanks,

Nick

SG300-28P (SRW2024-K9) in L3 mode. Intermittent loss of speed