01-12-2012 05:46 AM - edited 03-07-2019 04:18 AM
Good day, we've had an issue with our network, we have 2 6509 connected with redundancy, which are connected with 2 x 4900 Switches, from which are connected to a ESX Chassis for virtualization, the thing is that the ESX stopped working, and the 4900 switches, and the main core were suffering from overload, they hang on it very well, in order to stop the overload, one of the links to the ESX Chassis were disconnected from one of the 4900 switches. The CPU usage from the 4900 and the core(6509) went down below 40%, and then they started to migrate the virtual servers from the chassis to another 2 chassis that were added right after. They were actually working well, but suddenly the 6509 changed to the other supervisor after everything was ok. We were wondering what could have been the cause of this, maybe the virtual servers migrations, maybe the overload from the ESX ? We also had a few question, is there any need to reload the cores every few months as a planned task ? Because the cores have been up for more than 1 year. And also is there any kind of of tool to monitor the CPU status, or the status overall from the cores or the switches ?
Kind regards.
01-12-2012 05:50 AM
You could use Nagios (or some derivative eg. Centreon which is more user-friendly). There are many plugins to monitor Cisco hardware.
01-12-2012 06:06 AM
I'm going to ask the obvious here.... I'm a little concerned that you have CPU overload conditions - 6500 series chassis (while older) are pretty robust and very reliable (even in high bandwidth, high CPU scnearios), are you sure there are no network issues causing high CPU usage?
I agree with tkatsiaounis, Nagios would be a good choice, but any SNMP-based monitoring program can easily poll stats for interfaces/CPU/memory/errors/etc.
Thanks,
Sean Brown
01-12-2012 07:40 AM
hi,
I suggest you to open tac case to troubleshoot the 2x 6509. In the logs there are valuable outputs that might help understanding what happened and if the high CPU utilization was a cause or an effect of the outage.
Also, in the nonvolatile ram there are info about supervisors switchover.
But this type of post mortem analysis is too long and complicated to be performed on the forum as many crossed checks between logs of different platforms are needed.
Riccardo
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide