cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5542
Views
4
Helpful
8
Replies

Random port drops

Davy.Cave
Level 1
Level 1

Hi,

Lately I've noticed some strange behavior on some of the switchports.

When I go through the logs my SGE2000/2010 stack, I see that some of the ports randomly lose their connection:

2147482703 05-Jan-2013 04:11:43  Warning %LINK-W-Down:  2/g14       
2147482704 05-Jan-2013 03:35:20  Warning %STP-W-PORTSTATUS: 2/g33: STP status Forwarding       
2147482705 05-Jan-2013 03:34:50  Informational %LINK-I-Up:  2/g33       
2147482706 05-Jan-2013 03:34:47  Warning %LINK-W-Down:  2/g33       
2147482707 05-Jan-2013 03:34:19  Informational %LINK-I-Up:  2/g33       
2147482708 05-Jan-2013 03:34:17  Warning %LINK-W-Down:  2/g33       
2147482709 05-Jan-2013 03:34:15  Informational %LINK-I-Up:  2/g33       
2147482710 05-Jan-2013 03:34:14  Warning %LINK-W-Down:  2/g33       
2147482711 05-Jan-2013 03:34:12  Warning %STP-W-PORTSTATUS: 1/g15: STP status Forwarding       
2147482712 05-Jan-2013 03:33:42  Informational %LINK-I-Up:  1/g15       
2147482713 05-Jan-2013 03:33:40  Warning %LINK-W-Down:  1/g15       
2147482714 05-Jan-2013 03:33:20  Warning %STP-W-PORTSTATUS: 1/g15: STP status Forwarding       
2147482715 05-Jan-2013 03:32:50  Informational %LINK-I-Up:  1/g15       
2147482716 05-Jan-2013 03:32:47  Warning %LINK-W-Down:  1/g15       
2147482717 05-Jan-2013 03:31:48  Warning %STP-W-PORTSTATUS: 2/g5: STP status Forwarding       
2147482718 05-Jan-2013 03:31:18  Informational %LINK-I-Up:  2/g5     

I'm having trouble locating the source of the problem. The devices connected to the port are servers and desktops.

This happens frequently throughout the day, but not always on the same ports.

What could cause the random drops?

Thanks in advance!

1 Accepted Solution

Accepted Solutions

Tom Watts
VIP Alumni
VIP Alumni

Hi Davy, looks like you've got a stack.  The stack implementation of the older SFE/SGE weren't very great and do have some stability issues.

The common causes for ports to go up/down may include

  • Spanning tree (such as root bridge elections or max age time out tables resetting)
  • Negotiation (speed/duplex)
  • Discovery protocols such as bonjour
  • Over utilization (system resources or Layer 3)
  • Firmware problems

If it is at all possible, I'd break the stack and have the switches standalone. I would attribute to 90% of the problems to the stack. Most of the time it's just that, unfortunately.

If you'd like to troubleshoot off the 5 points listed above, you can make sure your root bridges are set correctly to avoid max age timers updating causing a drop in cam tables.

You may also manually set port speeds/negotiations to see if it stabilizes the connection. Discovery protocol like bonjour can cause unexpected errors so you may want to disable it.

If the switches have a really heavy load or high cpu/memory use, may try to remove a few connections. If the switches are operating in layer 3, you may be experiencing SFFT overflow errors since the software can't route fast enough.

Of course, could always be a firmware issue. Make sure you're on the latest!

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

View solution in original post

8 Replies 8

Tom Watts
VIP Alumni
VIP Alumni

Hi Davy, looks like you've got a stack.  The stack implementation of the older SFE/SGE weren't very great and do have some stability issues.

The common causes for ports to go up/down may include

  • Spanning tree (such as root bridge elections or max age time out tables resetting)
  • Negotiation (speed/duplex)
  • Discovery protocols such as bonjour
  • Over utilization (system resources or Layer 3)
  • Firmware problems

If it is at all possible, I'd break the stack and have the switches standalone. I would attribute to 90% of the problems to the stack. Most of the time it's just that, unfortunately.

If you'd like to troubleshoot off the 5 points listed above, you can make sure your root bridges are set correctly to avoid max age timers updating causing a drop in cam tables.

You may also manually set port speeds/negotiations to see if it stabilizes the connection. Discovery protocol like bonjour can cause unexpected errors so you may want to disable it.

If the switches have a really heavy load or high cpu/memory use, may try to remove a few connections. If the switches are operating in layer 3, you may be experiencing SFFT overflow errors since the software can't route fast enough.

Of course, could always be a firmware issue. Make sure you're on the latest!

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

Hi Tom,

First of all, thanks for the reply!

I will try your suggestions and will give feedback on it asap.

Our firmware is indeed outdated, so I'll give that a shot first.

Hi,

I've tried the answers you suggested, but so far I've been out of luck.

We do have some stand-alone SGE2000 switches in our network as well.

They've been showing the same behavior as the stacks:

147483044 15-Jan-2013 13:26:43  Warning %STP-W-PORTSTATUS: g4: STP status Forwarding       
2147483045 15-Jan-2013 13:26:41  Warning %LINK-W-Down:  g4       
2147483046 15-Jan-2013 08:30:07  Informational %LINK-I-Up:  g4       
2147483047 15-Jan-2013 08:30:07  Warning %STP-W-PORTSTATUS: g4: STP status Forwarding       
2147483048 15-Jan-2013 08:30:04  Warning %LINK-W-Down:  g4       
2147483049 15-Jan-2013 08:30:04  Informational %LINK-I-Up:  g4       
2147483050 15-Jan-2013 08:30:04  Warning %STP-W-PORTSTATUS: g4: STP status Forwarding       
2147483051 15-Jan-2013 08:30:02  Warning %LINK-W-Down:  g4       

We do have a lot of STP topology changes when I check it in the properties screen.

Might this be the cause of it?

And if so, how can I troubleshoot this?

root bridge elections are all in order and the max age timer is set to 20 seconds.

Also, our last topology change was 3days ago, but we get these random port drops every day.

Hi Davy, each switch has a default root bridge as 32768. What you want to do is make the head-most switch root bridge 4096 then the next in line 8192, next in line 12288, etc incrementing bu 4096. Additionally, you may try to globally filter BPDU.

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

Tom,

This has already been configured.

Our first stack is the root bridge with 24576

Our backup root bridge has 28672

Our other (stand-alone SGE2000 switches) are configured as 32768.

I have configured BPDU filtering in stead of flooding on all our switches as well.

I've added a picture to give you a better view of the topology:

Hi,

An update on the situation so far.

Setting the port to a static value seems to have helped for our stand alone switches!

The problem still persists on some of the ports on the stacks though.

This raised a few questions:

  • Why did the auto negotiation setting cause this problem?
  • As mentioned earlier, breaking up the stack seems to be the only solution to completely get rid of this problem?
    • What is the cause of the stack instability anyways?
  • Also, if we'd decide to break up our stack, we will redesign our network.
    For the moment, our network is a full L2 model. Would it be beneficial to implement a collapsed core model for example?
    • And if it'd be beneficial, what performance could we expect from our SGE2000/2010s in L3 mode?

Thanks in advance!

Hey Davy,

Thanks for the couple questions back. I'm not sure I'll give you the greatest answer but I will try.

Auto negotiation can be affected by a myriad of things. It could be (and some may seem silly...) the switch beging gigE and a NIC being 100, if the NIC is not advertising it is up to the switch to figure out what it is doing. This can lead to duplex mismatch, etc. This is often NOT seen on gigE between node and switch being half duplex doesn't exist (does it??? never seen it). It can also be media used, Cat5 is 100 mbit, Cat5e is roughly 350 mbit while cat6 is gigE. So it may be whats in between giving the fits. I'd recommend not to use Cat5e, just go with Cat5 or Cat6, not the middle man.

Second question, if you break the stack, the topology doesn't have to change. I do recommend a couple redundant links somewhere just incase a layer 1 break somewhere and let spanning-tree be spanning-tree. You never want a switch isolated due to wiring issues.

Last one, L3 mode, there is no performance benefit from the switch point of view. If you don't need the switches routing, don't use it. If your router is over-loaded, making the switch L3 will alleviate the router load and only send traffer that needs a router resource (such as internet).

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

Hi Tom,

We use Cat5e cabling throughout the building, so it could indeed be the wiring.

Anyways, thanks a lot for your time and help!

I've marked this question as answered! :-)