07-21-2010 09:41 PM - edited 03-06-2019 12:08 PM
My monitoring tool is reporting alerts for high cpu utilization on Nexus 5010.Image is 4.1(3)N1(1)
Only command supported on this code is sh proc cpu.The output of which does not really tell what is the current cpu utilization.How do i troubleshoot the cause of high cpu on nexus switches.
Any info will be much appreciated
thx
08-05-2010 07:22 PM
Hi,
show system resources is the command you are looking for. This along with show proc cpu will help you troubleshoot high cpu.
JayaKrishna
08-07-2010 04:05 PM
I have the same experience observing frequent high CPU on Nexus 5010 and 5020, while there isn't a significant amount of traffic.
No command seems to be able to pinpoint the process consuming the CPU.
Anybody else also observing this? So far traffic forwarding has been functional. Occasionally command prompt was very slow to respond. I'd appreciate if there is some definitive information on this questionable symptom.
5020-access# sh proc cpu hist
11 1
1 1 1 00901 1
8 3 8 1 918 1 8 2 81 6 304900606 12
100 ####
90 ####
80 ####
70 ####
60 ####
50 ####
40 ####
30 ####
20 # #####
10 # # # # # # # # # ###### #
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5
CPU% per second (last 60 seconds)
# = average CPU%
111111111111111111111111111111111111111111111111111111111111
000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000
100 ************************************************************
90 ************************************************************
80 ************************************************************
70 ************************************************************
60 ************************************************************
50 ************************************************************
40 ************************************************************
30 ************************************************************
20 ************************************************************
10 *********##**********#**#*******#******#***#**********#*****
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5
CPU% per minute (last 60 minutes)
* = maximum CPU% # = average CPU%
08-09-2010 06:28 PM
Hi,
Can you post "show proc cpu" sorted from the switch that is seeing this symptom. Do you have any SNMP configured on this switch, if yes can you turn it off and monitor it.
JayaKrishna
08-10-2010 03:22 PM
I tried turning off SNMP with no obvious difference. Never saw "show proc cpu sort" coming up with any run away process.
I am somewhat questioning whether it is a real CPU issue, or faulty display. Why would last 60 min always show high peak, and last 72 hour show very low peak.
5010-sw2# sh proc cpu sort
PID Runtime(ms) Invoked uSecs 1Sec Process
----- ----------- -------- ----- ------ -----------
3759 2348 5266079 0 4.0% pfma
1 1444 337537 4 0.0% init
# sh proc cpu hist
1
1 1 302 1 1
1 84 1 81 11 84709 1 82 4 3 731 1 1 81 1
100 #
90 #
80 #
70 #
60 #
50 #
40 ##
30 ###
20 # ### #
10 ## # # ### ## # #
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5
CPU% per second (last 60 seconds)
# = average CPU%
1 111 11 111111111111111111111111111111
999999898889999909900099990099000000000000000000000000000000
766632509978555608900042550088000000000000000000000000000000
100 **** *********** ************************************
90 ************************************************************
80 ************************************************************
70 ************************************************************
60 ************************************************************
50 ************************************************************
40 ************************************************************
30 ************************************************************
20 ************************************************************
10 #**********#**********#**********#**********#**********#****
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5
CPU% per minute (last 60 minutes)
* = maximum CPU% # = average CPU%
1
781777778779888777877798777768778689777888888897768787788779787768978677
100
90
80
70
60
50
40
30
20
10 ######**####*##########################*################****************
0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.
0 5 0 5 0 5 0 5 0 5 0 5 0
CPU% per hour (last 72 hours)
* = maximum CPU% # = average CPU%
01-04-2011 08:23 PM
Anyone find the cause of this? I've been observing this behavior on all Nexus 5010s living on my network.
01-05-2011 03:29 PM
This is likely a related bug: http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCth19083
Although it did not say that the bug only exists with DCNM, DCNM is probalby the only cause for frequent opening and closing of SSH sessions on 5k. In our case, disabling monitoring of 5k by DCNM was the fix. Check if you have a DCNM system. If not, try temporarily disabling/re-enabling SSH to see if it's the casue.
Note the bug is fixed in release 5.
Regards,
sean
10-29-2012 12:33 PM
I have a similar situation with some Nexus 5020's. Show proc cpu history indicates high cpu utilization when looking at the max value, but the average is 10% or lower. I opened a TAC case and the engineer indicated that this is common in the 5K platform. I'm paraphrasing here: NX-OS is Linux based and low priority processes are allowed to run the processor up to 100% for very short durations to keep it clear for high priority processes. That is why the average (in my case) is always very low but the max values can reach 100% during many of the one-minute intervals displayed in the "show proc cpu history" command output. The TAC engineer also indicated that, unless average processor utilization exceeds 50% on a regular basis, there really is not an issue. I did not realize this condition existed until a new implementation of LMS began receiving traps for high cpu utilization from the Nexus 5020's. Based on TAC's response to my case, I'm no longer concerned about the max values I'm seeing, but I'll be monitoring average CPU% as a more meaningful indicator.
01-05-2011 08:31 PM
I would agree that it sounds like a monitoring system of some sort causing it. Because it lasts for such a short period of time, you are unlikely to catch it with the "sh proc cpu sorted" command. I've seen similar behaviour on 6500s, and while I knew for sure that it was SNMP polling, I could never actually catch it in the act because it happens so quickly.
Keep in mind that a lot of these commands aren't necesarily all that exact either ;-). It's also extremely hard to find out exactly how a lot of these statistical "show" commands actually work as a lot of them generate the data off of different (depending on who wrote the application) polling cycles, and the exact information is proprietary.
01-09-2011 05:01 PM
That's exactly it. "show proc cpu" has limitations. Even running it with automated script did not produce conclusive results. However, it was useful to analyze the patterns with "show proc cpu history". If CPU spikes up periodically, it is likely in synch with DCNM polling. See how the pattern behaves by changing DCNM polling interval.
sean
01-10-2012 06:25 AM
I tried using the command "sh proc cpu hist" to see the overall CPU utilization on one of my 5010's but that command doesn't work. But our monitoring keeps giving us alerts that it is running above 95%. Before we open a TAC case I want to see for myself on that specific switch that it is spiking. Also nothing in the logs. version 4.1(3)N2(1)
01-11-2012 07:51 AM
Douglas,
Have you tried looking at these two bugs as we had a similar issue:
CSCte81951 -- show system resources does not show correct cpu utilization
CSCth08102 -- Gatos XL/Carmel: CPU states shows "nan% user" instead of numbers
thks,
Al
11-01-2012 12:10 AM
Hi
Have tried combining related bugs that may cause High CPU utilization in Nexus 5000.
Please do Refer :Troubleshooting High CPU Utilization on Nexus 5010
Do rate the correct answer and the document if you find it useful
Cheers
Sivagami.N
06-11-2018 09:25 AM
https://supportforums.cisco.com/t5/network-management/cpu/m-p/3079496/highlight/false#M113815
please visit this link it may help you!
06-14-2018 11:41 PM
Actually, there is a limitation/restriction on sup8E board. When you use sup8e either in RPR or SSO mode, only the first four uplinks on each supervisor engine are available. The second set of four uplinks are unavailable.
Regarding the uplink BW, when the daughter card is activated, Supervisor Engine 8-E baseboard uplink bandwidth is restricted to 40G as the default configuration in a ten-slot chassis.
In non-redundancy mode, the supervisor can support the first 4 active interfaces.
In redundancy mode, the first two interfaces on both the active and the standby supervisors become active.
In your case, you have a redundant sup installed and you see port 1-4 as active and remaining 5-8 as disabled. Since you using dual-sup, usually you should see first 2 ports in each sup to be in active/up state.
What you see in your situation is expected.
Sending a nice CCO link for your reference. Please go through
http://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst4500/XE3-7-0E/15-23E/configuration/guide/xe-370-configuration/sw_int.html#pgfId-1236145
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide