Solved: Re: CPU Utilization Problem on Cisco Switches

jain.manish94 · ‎03-14-2018

Hello Team,

Please help regarding this CPU utilization based problem.

Switch#sh processes cpu sorted 5sec

Core 0: CPU utilization for five seconds: 10%/5%; one minute: 9%; five minutes: 10%
Core 1: CPU utilization for five seconds: 4%; one minute: 4%; five minutes: 5%
Core 2: CPU utilization for five seconds: 4%; one minute: 3%; five minutes: 4%
Core 3: CPU utilization for five seconds: 1%; one minute: 2%; five minutes: 3%

My Question are here.

1. Why there are Core 0 to Core 3 (i have two stack switches)

2. Core 0: CPU utilization for five seconds: 10%/5%; one minute: 9%; five minutes: 10% ----- here what is the meaning of 10%/5%

Please do not share any PDF link i have many links , i need someone who can share some practically knowledge which he/she got.

EduardR · ‎03-14-2018

Hi!

In short:

1. There are Core0 to Core3 because each of your switch has 4 cores. It displays only the Cores of the Master one. If you need to check the slave you must check that specific stack member:

session stack-member-number

2. The meaning of those numbers is:

CPU utilization for five seconds: X%/Y%; one minute: Z%; five minutes: W%

X = Average total utilization during last five seconds (interrupts + processes)
Y = Average utilization due to interrupts, during last five seconds
Z = Average total utilization during last minute

W = Average total utilization during last five minutes

View solution in original post

EduardR · ‎03-15-2018

The interrupts are not inherently a problem, in the CPU world (quoting from Wikipedia) "an interrupt is a signal to the processor emitted by hardware or software indicating an event that needs immediate attention. An interrupt alerts the processor to a high-priority condition requiring the interruption of the current code the processor is executing". In the networking device world, an interrupt can be a port that has changed state UP/DOWN, some events in the routing protocols, events in the TCAM tables and so on... is just a signal that the hardware sends to the CPU to tell that something has happened and needs attention.

At the other hand, the high utilization can be caused by many situations, layer 2 loops, layer 3 loops, STP storms, broadcast storms, excess packets from the switching platform to the routing engine, and each situation must be analyzed individually. However, the switches can run up to 40% (or higher) with no issues, the switch are designed to run at that rate. Your utilization is barely noticiable and you didn't need to worry about it.

View solution in original post

Mark Malone · ‎03-14-2018

Hi
There is not an issue here cpus are very low hardly running at all , is this off an ios-xe switch like a 3650/3850 as there architecture is a 4 core setup per switch unlike the ios older switches/routers
its giving you the average of the command you checked , not sure why you would limit it to a 5 second check though , you asked the cpu to tell you what the average is for each core chip for 5 seconds
sh processes cpu sorted 5sec

EduardR · ‎03-14-2018

Hi!

In short:

1. There are Core0 to Core3 because each of your switch has 4 cores. It displays only the Cores of the Master one. If you need to check the slave you must check that specific stack member:

session stack-member-number

2. The meaning of those numbers is:

CPU utilization for five seconds: X%/Y%; one minute: Z%; five minutes: W%

X = Average total utilization during last five seconds (interrupts + processes)
Y = Average utilization due to interrupts, during last five seconds
Z = Average total utilization during last minute

W = Average total utilization during last five minutes

jain.manish94 · ‎03-14-2018

Hello Team,

Thanks for your reply.

Please can you let me know why they get interrupt and how to fix that problem.

how many reasons are they which causes cpu high utilization in Network please give me some major problem and tell me how to fix them.

EduardR · ‎03-15-2018

The interrupts are not inherently a problem, in the CPU world (quoting from Wikipedia) "an interrupt is a signal to the processor emitted by hardware or software indicating an event that needs immediate attention. An interrupt alerts the processor to a high-priority condition requiring the interruption of the current code the processor is executing". In the networking device world, an interrupt can be a port that has changed state UP/DOWN, some events in the routing protocols, events in the TCAM tables and so on... is just a signal that the hardware sends to the CPU to tell that something has happened and needs attention.

At the other hand, the high utilization can be caused by many situations, layer 2 loops, layer 3 loops, STP storms, broadcast storms, excess packets from the switching platform to the routing engine, and each situation must be analyzed individually. However, the switches can run up to 40% (or higher) with no issues, the switch are designed to run at that rate. Your utilization is barely noticiable and you didn't need to worry about it.

jain.manish94 · ‎03-15-2018

Thanks for reply.

How to resolve these types problem from Network .

are there any command to get to know about it. to solve it.

EduardR · ‎03-16-2018

There is not a single command to fix all the problems, it depends on the type of failure you are experiencing... i.e. you got a high CPU utilization and when you check the log you got a lot of TCN notifications or mac flapping messages, then you must check the STP to identify the ports where the flapping is happening, from there maybe you need to jump to another switch to check something on it, or you identify 2 ports connected to the same device without proper configuration and then you shut down one...

It depends on each case, there is not an universal command for every type of problem, maybe the only universal action is a reload, but it will not stop the issue and you do not want those on your production environment.

Joseph W. Doherty · ‎03-16-2018

First, understand, switches often shouldn't have high CPU utilization, even when passing much traffic, because normally transit traffic is handled by dedicated hardware. A switch's CPU is used generally used for everything but transit traffic.

However, sometimes, for different reasons, a switch's special hardware will pass transit traffic to the CPU for processing. This is something that's also generally a "very bad thing".

Eduard showed for a short duration, the CPU will display overall CPU utilization and interrupt CPU utilization. Eduard documented what an "interrupt" means, but on Cisco platforms, interrupt CPU often includes CPU for processing transit traffic, optimally. Non-interrupt CPU generally represents non-transit-traffic processing, and non-optimal transit traffic processing.

On a software based router, you want to see interrupt CPU closely track the overall utilization.

On a switch, you would expect interrupt utilization to be little CPU utilization.

Your switch's CPU utilization, both overall, and interrupt, seem very reasonable.