Cisco Nexus 5548 High CPU + Telnet unreachable

nexylan01 · ‎12-31-2016

Hello everyone,

We recently had a network migration project to replace Cisco6K5(SUP720-3BXL) with Nexus5548 (+L3 Daugther card).

But migration did not go well, mainly because of performance issues on the Nexus5K. Let me explain...

Almost as soon as Nexus went live, everything was fine as far as packets forwarding/routing was concerned but administration tasks were almost impossible to execute :

sh run : took 2/3 minutes
sh run int eth1/1 : 1/2 minutes
copy running-con startup-conf : 5 minutes
telnet : Process did not respond within the expected timeframe, please try again.

At some point I couldn't even login from console :

Process did not respond within the expected timeframe, please try again.

Few checks :

CPU : was at about 30/45%
sh proc cpu : showed much CPU on : bcm_usd
sh system resources : showed much user CPU
Version : System version: 7.0(3)N1(1)
Network : About 200/300 Mbps routed from one interface to another (no nat, no BGP, only default route).

Anyone would have ideas on how to solve this kind of issue?

Thanks for your help,

Gaëtan

nexylan01 · ‎12-31-2016

You should also probably note that now the NX5K is "out of production" (only management UP).

But it's still using CPU :

345435554544462342424242232423233234343252542343244454424533
668780992126303135680542775487517834593858725951558205679779
100
90
80
70
60 * ** * * * *
50 *** *** * * * * * * * * ** *** **
40 ************** * * * * * * * * *** * ** ** ****** ****
30 ************** ******** ************************************
20 ************************************************************
10 ############################################################
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5

CPU% per minute (last 60 minutes)
* = maximum CPU% # = average CPU%

PID Runtime(ms) Invoked uSecs 1Sec Process
----- ----------- -------- ----- ------ -----------
3303 22110 204867 107 10.4% bcm_usd
1 5981 15909 375 0.0% init
2 3 304 10 0.0% kthreadd
3 13 3229 4 0.0% migration/0
4 3434 715411 4 0.0% ksoftirqd/0
5 660 21560 30 0.0% watchdog/0
6 14 3245 4 0.0% migration/1
7 2306 529788 4 0.0% ksoftirqd/1
8 52 21560 2 0.0% watchdog/1

Load average: 1 minute: 0.12 5 minutes: 0.17 15 minutes: 0.21
Processes : 354 total, 2 running
CPU states : 7.5% user, 0.5% kernel, 92.0% idle
CPU0 states : 0.0% user, 1.0% kernel, 99.0% idle
CPU1 states : 14.9% user, 0.0% kernel, 85.1% idle
Memory usage: 8253868K total, 2519988K used, 5733880K free

nexylan01 · ‎12-31-2016

Interesting thing, it's actually statsclient which is consuming much CPU :

# sh processes cpu | exclude 0.0%

PID Runtime(ms) Invoked uSecs 1Sec Process
----- ----------- -------- ----- ------ -----------
3189 1277032 193981 6583 0.9% pfma
3303 23293 228649 101 9.5% bcm_usd
3367 1936710 773665 2503 32.3% statsclient

CPU util : 17.6% user, 20.5% kernel, 61.9% idle
Please note that only processes from the requested vdc are shown above

Richard Burts · ‎01-02-2017

I have seen symptoms similar to what you describe when the device was experiencing problems in allocating memory. Especially since this seems to be a recent install, can you verify that the device has at least the recommended minimum amount of memory. If the device does have sufficient memory then it might be related to a memory leak. Assuming that logging console is enabled, would you connect to the console and monitor messages to the console looking for messages about problems allocating memory.

HTH

Rick

HTH

Rick