01-04-2011 03:38 PM - edited 03-06-2019 02:49 PM
I'm having an issue with my network, where we're are experiencing random and brief network outages. They happen a couple times a day and last 5-10 seconds. when I check my two backbone switches (4506 : Supervisor: WS-X4516-10GE ,IOS : cat4500-ipbase-mz.122-31.SGA8.bin), STP remains normal and no topology change occurs.
01-05-2011 10:20 AM
Couple of questions:
First, How many packets/sec are those two servers sending? The SupV-10GB is limited to 136-Gbps, 102 million packets-per-second actual forwarding rate. Many people focus on the first number but the real number is the second. As a possible example lets say that the 18GB of data is being sent in 1K chunks instead of 1.5K (MTU of ethernet) chunks. 18GB per minute=2.4G bits/sec=2.4m packets per second per server
Second, what kind of line cards are you using? Some line cards are oversubscribed. You could have an issue where one of those servers shares a port group with one or more uplinks or other critical ports. During the server transmits that group essentially dies taking other conections down.
Third, do any interfaces show errors? in particular do any interfaces show output drops (indicating buffer or oversubscription issues), tx or rx errors or collisions (duplex mismatch)
01-05-2011 12:59 PM
Hi nspitzer5
Firstly thank you for your help.
Today the problem occurred after traffic analysis I did not find any evidence for thesetwo servers. despite that the problem is still persisted. slowness was 20 to 30 seconds. Always toplogie STP is stable and not broadcast or multicast traffic
secondly, the line cards are:
Mod Ports Card Type Model
---+-----+--------------------------------------+------------------+-----------
1 6 Sup V-10GE 10GE (X2), 1000BaseX (SFP) WS-X4516-10GE
2 6 1000BaseX (GBIC) WS-X4306-GB
3 6 1000BaseX (GBIC) WS-X4306-GB
4 48 10/100/1000BaseT (RJ45) WS-X4548-GB-RJ45
6 48 10/100/1000BaseT (RJ45) WS-X4548-GB-RJ45
third, they actually exist some server interfaces with errors but the interfaces Uplink are well.
Thanks
01-05-2011 01:43 PM
Have a few questions:
Who is slow/down (users, servers,etc)?
Are all these devices in 1 VLAN/IP subnet, or are there multiple VLANs that are affected?
How many switches are part of the network that is slow?
Is this 4506 a core switch, or just access?
What are the ports on the 4506 that are affected with the slowness?
Is the CPU on the 4506 spiking during the slowness?
What is the interface utilization for each of the affected ports (make sure you set the load interval to 30 to get the best picture) when the slowness occurs?
Can you quantify "slow or outage"? IE: loss of pings for x number of seconds, etc?
Do you have any QoS configured on the affected devices?
Can you post a "show int" for the server ports during the outage?
You can always PM me to discuss more in detail if you wish.
01-05-2011 03:09 PM
Hi dbass ,
Who is slow/down (users, servers,etc)?
Users
Are all these devices in 1 VLAN/IP subnet, or are there multiple VLANs that are affected?
there multiple VLANs , but the most users are in the Vlan 1 ( with servers ).
at present , we move the users in the vlan 1 to the appropriate vlan.
How many switches are part of the network that is slow?
all swithchs
Is this 4506 a core switch, or just access?
there are two 4506 in the core , and eight 4503 in the access.
What are the ports on the 4506 that are affected with the slowness?
I do not know
Is the CPU on the 4506 spiking during the slowness?
the CPU is normal during the slowness.
Can you quantify "slow or outage"? IE: loss of pings for x number of seconds, etc?
Yes , the are a lot of loss of pings ( when the are de slowness ) , but sometimes the outage affact the IPphone.
Do you have any QoS configured on the affected devices?
no , but i will configure the QOS in the LAN for the VOIP.
Thank you for your Help
01-05-2011 07:59 PM
Do you HSRP between your 2 4506 core switches? If you do, does HSRP fail over during your outage? Do you have spanning tree configured so that your core switches are root and secondary root, and that all of the access switches are a much higher priority?
As a general rule, never have servers in the same VLAN as your users, and you shouldn't be using VLAN 1 either. I would change the VLAN number from 1 to something else ASAP as well as moving the users in to their own VLANs.
In the switch logs do you see any MAC addresses flapping between ports?
Also, do your access switches have redundant connections to your core switches? Are they connected directly to the core switches or are they daisy chained off of one another?
Are the access switches all in the same building as the core switch?
01-06-2011 02:14 AM
Hi ,
Do you HSRP between your 2 4506 core switches? If you do, does HSRP fail over during your outage?
Yes i have HSRP between the 2 core Switches . i dont know if HSRP fail because during outage i dont't have access to the Switch with telnet.
Do you have spanning tree configured so that your core switches are root and secondary root, and that all of the access switches are a much higher priority?
Yes , STP is configured very well , and it's don't failed furing the outage. all the access switches has a higher priority .
In the switch logs do you see any MAC addresses flapping between ports?
no , i saw the STP is stable!!!
Also, do your access switches have redundant connections to your core switches? Are they connected directly to the core switches or are they daisy chained off of one another?
the access swithces have redundant connections to my core switches through the favric link (GBIC).
Are the access switches all in the same building as the core switch?
yes.
thank you for uour help
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide