12-23-2014 01:04 AM - edited 03-07-2019 09:59 PM
Hi All
Yesterday, we had major issues on our LAN, We have 2 x 4507 core switches, with 2960S switches connected at 10gig to both cores.
The site called saying nothing was accessible, I checked the cores and both were at 100% cpu
The process using the cpu was K5CpuMan Review
alot of the uplinks to the switches had been err disabled with reason loopback error on them, and I could not reach most of the switches!
On one of thr switches I managed to get on, I did a show controllers utilization, and 2 ports were transmitting at 100% and the uplink was receiving at 100%
3 switches are still currently powered off and the core is running 50% cpu, does this seem high?
what could the issue be? loop/broadcast storm maybe ?
what are the best commands on the 4507 for seeing whats going on ?
cheers
12-23-2014 01:23 AM
Hi Carl,
Try Switch#show processes cpu
This command will hopefully narrow down which process is taking up alot of CPU. This might be a silly question are you running debug by mistake ?
12-23-2014 07:06 AM
the issue still seems to be there but better
the cpu on core 1 is around 50% , core 2 is about 30%
is that high?
its not a big network, about 25 2906s switches connected at 10gig!
I did a packet sniff on the core on a trunk port, and im seeing around 5000 packets per second, multicast to 224.0.0.2 hsrp!
that seems highly excessive!!
how can we get a loop like this on all vlans?
could it be an issue on the cores ?
12-23-2014 08:38 AM
cpu running at 50 % is okay, but how much it was before the incident?
I am quite not sure about the packets you see from sniffer? What are the sources of those traffic ?
Are they all from a particular mac address ?
Also share the below
sh ver
sh proc cpu sorted | ex 0.00%
sh mod
12-23-2014 01:33 AM
Carl,
Some of the symptoms you describe are consistent with a switching loop indeed.
One of the most important clues is the number of err-disabled ports due to loopback. The "loopback" cause for err-disabling a port is, to my best knowledge, always related to a switch receiving back its own LOOP frame. A LOOP frame is sent every 10 seconds out of each switchport, and is both sourced from and destined to the MAC address of the port from which it was sent; in other words, the source and destination MAC address of a LOOP frame are identical and set to the MAC address of the switchport that originated the frame. A neighboring switch receiving such a LOOP frame would, ordinarily, never send the frame back because it would constitute forwarding the frame back the very port on which it was received, and switches should never do that.
So for a switch to actually receive back its own LOOP frame on the port it was originally sent from, it would really require a switching loop to occur somewhere in the network, or it would require the neighboring switch to undergo some strange moment of "mind-blindness" that would cause it to forward a frame back the ingress port. The cause for this is so far unknown.
A number of questions and recommendations - please try to respond to each one of them:
Best regards,
Peter
12-23-2014 01:34 AM
Hello
My initial thought iwould be indeed broadcast storm or stp loop - this can indeed bring down a network you also mentioned loopback err disabled interfaces.
You don't say what kind of precautions are set in place for negating such issues but manually defining stp roots and port security would be the way to go
Is it possible a loop can be introduced into your network by a end user?
i was unfortunate once to have a really big outage related to a loop and I also wasn't able to isolate the problem or access any devices to troubleshoot
so what I did was from the cores switches I manually disconnect each interconnect to my distrubtion closets whilst at the same time monitored the network via icmp
My thought process was this way when the loop was broken Icmp would retrun and then I could trace down the root cause via interconnect that broken the loop
This did indeed work for me that time and I was fortunate it isolated the offending unmanaged network device causing the outage and that was on a 3com network and no such precautions stated above was set in place
res
paul
12-23-2014 02:44 AM
Dear Carl,
This does seems to be due to traffic hitting cpu.
Please send me the below from the core switch.
sh ver
sh mod
sh proc cpu sorted | ex 0.00%
Thanks,
M
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide