cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2675
Views
5
Helpful
7
Replies

High cpu nexus 3064pq

Fernando Galvao
Level 1
Level 1

Folks, I have a backbone ring with 3 cisco nexus 3064pq in layer 3 version 7 of the nx-os system. I use it for ospf v4 and v6, average traffic of 11Gbps with connection of 6 Bras pppoe. I'm noticing the very high usage of cpu for some processes like snmp (I use zabbix to monitor it) and a feature mgr that I show below. When you disable snmp it drops to 11% of total cpu. Now with active snmp it has 70% spikes. How can I resolve this?

PID Runtime(ms) Invoked uSecs 1Sec Process
----- ----------- -------- ----- ------ -----------
21440 307833657 1249531433 246 20.50% feature-mgr
21717 180427605 1764164910 102 10.00% snmpd
22742 92 45 2066 6.00% ecp

7 Replies 7

Reza Sharifi
Hall of Fame
Hall of Fame

Most likely a bug in the OS that causes the CPU to rise when SNMP is enabled.

Open a ticket with Cisco TAC and have them investigate.

HTH

Can I open it even when the equipment is out of warranty? Because it in more than 1 year of use and no longer has warranty. And this process feature-mgr how to solve?

Usually if you don't have a service contract, they will not support the device but you can always give them a call and ask.

Good luck

Andrea Testino
Cisco Employee
Cisco Employee

Fernando,

Can you share the output to the following if possible:

show processes cpu sort | ex 0.0

show processes cpu history

show version (remove your SN here)

ethanalyzer local interface inbound-hi display-filter snmp limit-c 2000 > bootflash:SNMP.txt

ethanalyzer local interface inbound-low display-filter snmp limit-c 2000 >> bootflash:SNMP.txt

ethanalyzer local interface mgmt display-filter snmp limit-c 2000 >> bootflash:SNMP.txt

Traffic will only show in one of these but I am not sure if your SNMP traffic is inband or out of band. Ctrl+C out of this as needed.

If the box is being polled heavily by multiple monitoring tools, it is not abnormal for us to see high cpu - With the Ethanalyzer commands, we should be able to see what IP addresses are reaching the box for SNMP traffic and you can address that accordingly.

As far as feature-mgr, is it always at 20% or it just so happened to be when you ran the command? Feature Manager is in charge of enabling/disabling features on the switch/router (the "feature" cli command) - If it is always at a higher percentage, you can try running the following two to get an idea of what is happening "under the hood":

show system internal feature-mgr event-history errors
show system internal feature-mgr event-history msgs

Hope that helps.

- Andrea

- Andrea, CCIE #56739 R&S

Friend,


Thank you for helping me. I will pass most of the return of the attached commands.

What I observed, is when processing goes up, zabbix (which I use to monitor), stops collecting and the graphs are left without information. I see it happens when the traffic is high but it has already happened with very low traffic as well. I only monitor the traffic of the interfaces and the consumption of the CPU and nothing else. The 1/49 interface is 40Gbps and is connected to another nexus 3064 in layer 2 and this nexus layer 2 is connected in our ASR9001 edge router. The nexus with layer 2 passes the same drag but the cpu is not more than 6% and I have no problems with it. Only with the one in Layer 3.

PE0-CISCO (config) # ethanalyzer local interface inband display-filter snmp limit-c 2000> bootflash: SNMP.txt Capturing on inband 278 packets captured

PE0-CISCO (config) # ethanalyzer local interface mgmt display-filter snmp limit-c 2000> bootflash: SNMPmgmt.txt

PE0-CISCO(config-if)# show processes cpu sort | ex 0.0

PID    Runtime(ms)  Invoked   uSecs  1Sec    Process

-----  -----------  --------  -----  ------  -----------

21440    332073490  1348887861    246   7.00%  feature-mgr

23336    127118634  231152192    549   7.00%  t2usd

   27    408863539  649218089    629   6.00%  ksmd

22742           92        45   2066   3.00%  ecp

22801    121685168  882922885    137   3.00%  ethpm

22741      3091469  23004552    134   2.00%  eth_port_channel

13933     53348480  777243205     68   1.00%  sysmgr

21642          118        75   1582   1.00%  icmpv6

21687          285       126   2262   1.00%  netstack

22850          153       217    705   1.00%  ospf

CPU util  :   12.76% user,   12.23% kernel,   75.00% idle

Software

  BIOS: version 4.0.0

  NXOS: version 7.0(3)I5(2)

  BIOS compile time:  12/06/2016

  NXOS image file is: bootflash:///nxos.7.0.3.I5.2.bin

  NXOS compile time:  2/16/2017 8:00:00 [02/16/2017 17:03:27]

Hardware

  cisco Nexus3064 Chassis

  Intel(R) Celeron(R) CPU        P4505  @ 1.87GHz with 3903216 kB of memory.

  Processor Board ID FOC16256KTT

  Device name: PE0-CISCO

  bootflash:    1635720 kB

  usb1:               0 kB (expansion flash)

Kernel uptime is 50 day(s), 22 hour(s), 58 minute(s), 41 second(s)

Last reset at 384678 usecs after  Fri Jun  9 08:46:24 2017

PE0-CISCO(config)# sho interface ethernet 1/49

Ethernet1/49 is up

admin state is up, Dedicated Interface

  Hardware: 40000 Ethernet, address: a44c.11b8.b518 (bia a44c.11b8.b518)

  Description: PE0-CISCOxCORE-CISCO

  MTU 1500 bytes, BW 40000000 Kbit, DLY 10 usec

  reliability 255/255, txload 2/255, rxload 26/255

  Encapsulation ARPA, medium is broadcast

  Port mode is access

  full-duplex, 40 Gb/s, media type is 40G

  Beacon is turned off

  Auto-Negotiation is turned on, FEC mode is Auto

  Input flow-control is off, output flow-control is off

  Auto-mdix is turned off

  Rate mode is dedicated

  Switchport monitor is off

  EtherType is 0x8100

  EEE (efficient-ethernet) : n/a

  Last link flapped 7week(s) 1day(s)

  Last clearing of "show interface" counters never

  1 interface resets

  30 seconds input rate 4147105520 bits/sec, 407351 packets/sec

  30 seconds output rate 454472536 bits/sec, 283323 packets/sec

  Load-Interval #2: 5 minute (300 seconds)

    input rate 4.15 Gbps, 407.58 Kpps; output rate 451.21 Mbps, 283.45 Kpps

  RX

    2643555793891 unicast packets  4415412 multicast packets  50025 broadcast packets

    2643560259328 input packets  3295660538907904 bytes

    0 jumbo packets  0 storm suppression packets

    0 runts  0 giants  0 CRC  0 no buffer

    0 input error  0 short frame  0 overrun   0 underrun  0 ignored

    0 watchdog  0 bad etype drop  0 bad proto drop  0 if down drop

    0 input with dribble  52407 input discard

    0 Rx pause

  TX

    1858487814940 unicast packets  2712839 multicast packets  67174 broadcast packets

    1858490594953 output packets  370623804264085 bytes

    0 jumbo packets

    0 output error  0 collision  0 deferred  0 late collision

    0 lost carrier  0 no carrier  0 babble  3476 output discard

    0 Tx pause

 

Fernando,

Anytime. Thanks for grabbing the outputs. 

Based on the Ethanalyzer capture, I see quite a few IP addresses hitting the box - One (or more) of these will be owned by this particular Nexus.

Which one of these is your Zabbix? Are the other ones other monitoring tools? Or should they not be polling the box? This may be something you want to look into -

171.50.175.87
138.204.68.137
168.205.37.17
139.59.78.197
172.16.1.97
170.79.34.9
170.83.199.1
138.204.68.141
138.204.68.2
168.205.37.254

I also see quite a few output & input discards on Eth1/49 - Is this where your SNMP traffic from Zabbix would be ingressing? If so, could you clear that particular interface counters and once they increment again, you can check these:

# show hardware internal bcm-usd port-stats slot-num 0 front-port 49
# show hardware internal forwarding l3 counters | sec "Port : 49"

# show hardware internal interface indiscard-stats front-port 49

Looks like feature-mgr on the latest output is much lower and I do not see a high SNMP utilization - Any chance you can share "show processes cpu history"?

- Andrea

- Andrea, CCIE #56739 R&S

Zabbix was monitoring the internal interface mgmt 0 with IP 10.10.10.8. For the test, I left him monitoring for IP 168.205.37.254. The other IPs are necessary because they are connections with Bras pppoe, clients with blocks of valid IPs and so on.

The command option below does not have bcm-usd and indiscard-stats, what use?

PE0-CISCO(config)# show hardware internal  ?
access-list buffer cpu-mac dev-version errors forwarding interface memory-ecc mgmt0 plog sprom version
bootflash cpu dev-port-map eobc fabric inband-rcpu logflash memory-model ns sensor statistics

PE0-CISCO(config)# show hardware internal forwarding l3 counters | sec "Port : 49"
Port : 49
Counters:
IfInOctets = 3386558104664949
IfInUcastPkts = 2715412006487
IfInNUcastPkts = 4525590
IfInDiscards = 52407
IfOutOctets = 379946477639562
IfOutUcastPkts = 1909160518313
IfOutNUcastPkts = 2814748
IfOutDiscards = 3476
IpInReceives = 2700090377270
IpForwDatagrams = 1908980314407
Dot1dTpPortInFrames = 2715416532077
Dot1dTpPortOutFrames = 1909163333061
EtherStatsMulticastPkts = 7218837
EtherStatsBroadcastPkts = 121501
EtherStatsPkts64Octets = 227838393012
EtherStatsPkts65to127Octets = 1701064400028
EtherStatsPkts128to255Octets = 104504672589
EtherStatsPkts256to511Octets = 85371944424
EtherStatsPkts512to1023Octets = 65372539369
EtherStatsPkts1024to1518Octets = 2440427915716
EtherStatsOctets = 3766504582304511
EtherStatsPkts = 4624579865138
EtherStatsTXNoErrors = 1909163333061
EtherStatsRXNoErrors = 2715416532079
IfInBroadcastPkts = 50027
IfInMulticastPkts = 4475563
IfOutBroadcastPkts = 71474
IfOutMulticastPkts = 2743274
BcmReceivedPkts64Octets = 37522543182
BcmReceivedPkts65to127Octets = 238786079701
BcmReceivedPkts128to255Octets = 58657367631
BcmReceivedPkts256to511Octets = 47702605621
BcmReceivedPkts512to1023Octets = 41092873211
BcmReceivedPkts1024to1518Octets = 2291655062731
BcmTransmittedPkts64Octets = 190315849830
BcmTransmittedPkts65to127Octets = 1462278320327
BcmTransmittedPkts128to255Octets = 45847304958
BcmTransmittedPkts256to511Octets = 37669338803
BcmTransmittedPkts512to1023Octets = 24279666158
BcmTransmittedPkts1024to1518Octets = 148772852985
bcmDbgCntTIPD4 = 10860116092
bcmDbgCntTL2_MTU = 3

PE0-CISCO(config)# show processes cpu history

121111111121121111111 11111122223212125222245676445553113454
913858351886607387713959317574062172904426649362881148137470
100
90
80 #
70 #
60 #### #
50 # ######### #
40 # ########### ####
30 # # ## # ############# ####
20 ## ### # ###### ### ## ############################ ####
10 ############################################################
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5

CPU% per second (last 60 seconds)
# = average CPU%

765686765766556765777767787658766767766766776776766666766557
635577208427364999050689790871754087769899098851869883484456
100
90 * *
80 * * * * * *** ** ** * * ** * *
70 * **** * * ** ********** *** ************** ***** ** *
60 ************ ******************************************** **
50 ************************************************************
40 ***************##***##***##******************#**************
30 ##***##***##**#################****##***###**##***#****##***
20 ############################################################
10 ############################################################
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5

CPU% per minute (last 60 minutes)
* = maximum CPU% # = average CPU%


111 1 1 11 11 11 11 1111 1111
999000909988899899999979988899999889099900990099989800999900980000990000
739000908934158256050859055610909290093400830096190200992500900000870000
100 * ******** ** ** * * * * * ** *** **** **** **** **********
90 ********** ** ****** ********** ***************** ********* **********
80 ************************************************************************
70 ********************************************#**************#****#****##*
60 ********************************************#**************##***##**###*
50 ********************************************#*********##***##**###**###*
40 *************************************#**#***#**#*****###**###**####*####
30 ##*******#******######*########*****###############*#########*##########
20 ########################################################################
10 ########################################################################
0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.
0 5 0 5 0 5 0 5 0 5 0 5 0

CPU% per hour (last 72 hours)
* = maximum CPU% # = average CPU%

The zabbix tracking ip is on port 1/49 under a vlan:interface Vlan1100
description PE0-CISCOxLAN-CORE-CISCO
no shutdown
management
no ip redirects
ip address 168.205.37.254/30
ip address 168.205.37.14/30 secondary
ipv6 address 2804:2728:0:1::2/64
ip router ospf 1 area 0.0.0.0
ipv6 router ospfv3 1 area 0.0.0.0

I will not delete the VLan 1100 counter because this layer switch is not working the counter until zabbix does not show traffic. The 3064 nexus in layer 2 normally runs the vlans traffic.
Review Cisco Networking for a $25 gift card