High cpu nexus 3064pq

Fernando Galvao · ‎07-28-2017

Folks, I have a backbone ring with 3 cisco nexus 3064pq in layer 3 version 7 of the nx-os system. I use it for ospf v4 and v6, average traffic of 11Gbps with connection of 6 Bras pppoe. I'm noticing the very high usage of cpu for some processes like snmp (I use zabbix to monitor it) and a feature mgr that I show below. When you disable snmp it drops to 11% of total cpu. Now with active snmp it has 70% spikes. How can I resolve this?

PID Runtime(ms) Invoked uSecs 1Sec Process
----- ----------- -------- ----- ------ -----------
21440 307833657 1249531433 246 20.50% feature-mgr
21717 180427605 1764164910 102 10.00% snmpd
22742 92 45 2066 6.00% ecp

Reza Sharifi · ‎07-28-2017

Most likely a bug in the OS that causes the CPU to rise when SNMP is enabled.

Open a ticket with Cisco TAC and have them investigate.

HTH

Fernando Galvao · ‎07-28-2017

Can I open it even when the equipment is out of warranty? Because it in more than 1 year of use and no longer has warranty. And this process feature-mgr how to solve?

Reza Sharifi · ‎07-28-2017

Usually if you don't have a service contract, they will not support the device but you can always give them a call and ask.

Good luck

Andrea Testino · ‎07-29-2017

Fernando,

Can you share the output to the following if possible:

show processes cpu sort | ex 0.0

show processes cpu history

show version (remove your SN here)

ethanalyzer local interface inbound-hi display-filter snmp limit-c 2000 > bootflash:SNMP.txt

ethanalyzer local interface inbound-low display-filter snmp limit-c 2000 >> bootflash:SNMP.txt

ethanalyzer local interface mgmt display-filter snmp limit-c 2000 >> bootflash:SNMP.txt

Traffic will only show in one of these but I am not sure if your SNMP traffic is inband or out of band. Ctrl+C out of this as needed.

If the box is being polled heavily by multiple monitoring tools, it is not abnormal for us to see high cpu - With the Ethanalyzer commands, we should be able to see what IP addresses are reaching the box for SNMP traffic and you can address that accordingly.

As far as feature-mgr, is it always at 20% or it just so happened to be when you ran the command? Feature Manager is in charge of enabling/disabling features on the switch/router (the "feature" cli command) - If it is always at a higher percentage, you can try running the following two to get an idea of what is happening "under the hood":

show system internal feature-mgr event-history errors
show system internal feature-mgr event-history msgs

Hope that helps.

- Andrea

- Andrea, CCIE #56739 R&S

Fernando Galvao · ‎07-30-2017

Friend,


Thank you for helping me. I will pass most of the return of the attached commands.

What I observed, is when processing goes up, zabbix (which I use to monitor), stops collecting and the graphs are left without information. I see it happens when the traffic is high but it has already happened with very low traffic as well. I only monitor the traffic of the interfaces and the consumption of the CPU and nothing else. The 1/49 interface is 40Gbps and is connected to another nexus 3064 in layer 2 and this nexus layer 2 is connected in our ASR9001 edge router. The nexus with layer 2 passes the same drag but the cpu is not more than 6% and I have no problems with it. Only with the one in Layer 3.

PE0-CISCO (config) # ethanalyzer local interface inband display-filter snmp limit-c 2000> bootflash: SNMP.txt Capturing on inband 278 packets captured

PE0-CISCO (config) # ethanalyzer local interface mgmt display-filter snmp limit-c 2000> bootflash: SNMPmgmt.txt

PE0-CISCO(config-if)# show processes cpu sort | ex 0.0

PID Runtime(ms) Invoked uSecs 1Sec Process

----- ----------- -------- ----- ------ -----------

21440 332073490 1348887861 246 7.00% feature-mgr

23336 127118634 231152192 549 7.00% t2usd

27 408863539 649218089 629 6.00% ksmd

22742 92 45 2066 3.00% ecp

22801 121685168 882922885 137 3.00% ethpm

22741 3091469 23004552 134 2.00% eth_port_channel

13933 53348480 777243205 68 1.00% sysmgr

21642 118 75 1582 1.00% icmpv6

21687 285 126 2262 1.00% netstack

22850 153 217 705 1.00% ospf

CPU util : 12.76% user, 12.23% kernel, 75.00% idle

Software

BIOS: version 4.0.0

NXOS: version 7.0(3)I5(2)

BIOS compile time: 12/06/2016

NXOS image file is: bootflash:///nxos.7.0.3.I5.2.bin

NXOS compile time: 2/16/2017 8:00:00 [02/16/2017 17:03:27]

Hardware

cisco Nexus3064 Chassis

Intel(R) Celeron(R) CPU P4505 @ 1.87GHz with 3903216 kB of memory.

Processor Board ID FOC16256KTT

Device name: PE0-CISCO

bootflash: 1635720 kB

usb1: 0 kB (expansion flash)

Kernel uptime is 50 day(s), 22 hour(s), 58 minute(s), 41 second(s)

Last reset at 384678 usecs after Fri Jun 9 08:46:24 2017

PE0-CISCO(config)# sho interface ethernet 1/49

Ethernet1/49 is up

admin state is up, Dedicated Interface

Hardware: 40000 Ethernet, address: a44c.11b8.b518 (bia a44c.11b8.b518)

Description: PE0-CISCOxCORE-CISCO

MTU 1500 bytes, BW 40000000 Kbit, DLY 10 usec

reliability 255/255, txload 2/255, rxload 26/255

Encapsulation ARPA, medium is broadcast

Port mode is access

full-duplex, 40 Gb/s, media type is 40G

Beacon is turned off

Auto-Negotiation is turned on, FEC mode is Auto

Input flow-control is off, output flow-control is off

Auto-mdix is turned off

Rate mode is dedicated

Switchport monitor is off

EtherType is 0x8100

EEE (efficient-ethernet) : n/a

Last link flapped 7week(s) 1day(s)

Last clearing of "show interface" counters never

1 interface resets

30 seconds input rate 4147105520 bits/sec, 407351 packets/sec

30 seconds output rate 454472536 bits/sec, 283323 packets/sec

Load-Interval #2: 5 minute (300 seconds)

input rate 4.15 Gbps, 407.58 Kpps; output rate 451.21 Mbps, 283.45 Kpps

RX

2643555793891 unicast packets 4415412 multicast packets 50025 broadcast packets

2643560259328 input packets 3295660538907904 bytes

0 jumbo packets 0 storm suppression packets

0 runts 0 giants 0 CRC 0 no buffer

0 input error 0 short frame 0 overrun 0 underrun 0 ignored

0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop

0 input with dribble 52407 input discard

0 Rx pause

TX

1858487814940 unicast packets 2712839 multicast packets 67174 broadcast packets

1858490594953 output packets 370623804264085 bytes

0 jumbo packets

0 output error 0 collision 0 deferred 0 late collision

0 lost carrier 0 no carrier 0 babble 3476 output discard

0 Tx pause

Andrea Testino · ‎07-30-2017

Fernando,

Anytime. Thanks for grabbing the outputs.

Based on the Ethanalyzer capture, I see quite a few IP addresses hitting the box - One (or more) of these will be owned by this particular Nexus.

Which one of these is your Zabbix? Are the other ones other monitoring tools? Or should they not be polling the box? This may be something you want to look into -

171.50.175.87
138.204.68.137
168.205.37.17
139.59.78.197
172.16.1.97
170.79.34.9
170.83.199.1
138.204.68.141
138.204.68.2
168.205.37.254

I also see quite a few output & input discards on Eth1/49 - Is this where your SNMP traffic from Zabbix would be ingressing? If so, could you clear that particular interface counters and once they increment again, you can check these:

# show hardware internal bcm-usd port-stats slot-num 0 front-port 49
# show hardware internal forwarding l3 counters | sec "Port : 49"

# show hardware internal interface indiscard-stats front-port 49

Looks like feature-mgr on the latest output is much lower and I do not see a high SNMP utilization - Any chance you can share "show processes cpu history"?

- Andrea

- Andrea, CCIE #56739 R&S

Fernando Galvao · ‎07-31-2017

Zabbix was monitoring the internal interface mgmt 0 with IP 10.10.10.8. For the test, I left him monitoring for IP 168.205.37.254. The other IPs are necessary because they are connections with Bras pppoe, clients with blocks of valid IPs and so on.

The command option below does not have bcm-usd and indiscard-stats, what use?

PE0-CISCO(config)# show hardware internal ?
access-list buffer cpu-mac dev-version errors forwarding interface memory-ecc mgmt0 plog sprom version
bootflash cpu dev-port-map eobc fabric inband-rcpu logflash memory-model ns sensor statistics

PE0-CISCO(config)# show hardware internal forwarding l3 counters | sec "Port : 49"
Port : 49
Counters:
IfInOctets = 3386558104664949
IfInUcastPkts = 2715412006487
IfInNUcastPkts = 4525590
IfInDiscards = 52407
IfOutOctets = 379946477639562
IfOutUcastPkts = 1909160518313
IfOutNUcastPkts = 2814748
IfOutDiscards = 3476
IpInReceives = 2700090377270
IpForwDatagrams = 1908980314407
Dot1dTpPortInFrames = 2715416532077
Dot1dTpPortOutFrames = 1909163333061
EtherStatsMulticastPkts = 7218837
EtherStatsBroadcastPkts = 121501
EtherStatsPkts64Octets = 227838393012
EtherStatsPkts65to127Octets = 1701064400028
EtherStatsPkts128to255Octets = 104504672589
EtherStatsPkts256to511Octets = 85371944424
EtherStatsPkts512to1023Octets = 65372539369
EtherStatsPkts1024to1518Octets = 2440427915716
EtherStatsOctets = 3766504582304511
EtherStatsPkts = 4624579865138
EtherStatsTXNoErrors = 1909163333061
EtherStatsRXNoErrors = 2715416532079
IfInBroadcastPkts = 50027
IfInMulticastPkts = 4475563
IfOutBroadcastPkts = 71474
IfOutMulticastPkts = 2743274
BcmReceivedPkts64Octets = 37522543182
BcmReceivedPkts65to127Octets = 238786079701
BcmReceivedPkts128to255Octets = 58657367631
BcmReceivedPkts256to511Octets = 47702605621
BcmReceivedPkts512to1023Octets = 41092873211
BcmReceivedPkts1024to1518Octets = 2291655062731
BcmTransmittedPkts64Octets = 190315849830
BcmTransmittedPkts65to127Octets = 1462278320327
BcmTransmittedPkts128to255Octets = 45847304958
BcmTransmittedPkts256to511Octets = 37669338803
BcmTransmittedPkts512to1023Octets = 24279666158
BcmTransmittedPkts1024to1518Octets = 148772852985
bcmDbgCntTIPD4 = 10860116092
bcmDbgCntTL2_MTU = 3

PE0-CISCO(config)# show processes cpu history

121111111121121111111 11111122223212125222245676445553113454
913858351886607387713959317574062172904426649362881148137470
100
90
80 #
70 #
60 #### #
50 # ######### #
40 # ########### ####
30 # # ## # ############# ####
20 ## ### # ###### ### ## ############################ ####
10 ############################################################
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5

CPU% per second (last 60 seconds)
# = average CPU%

765686765766556765777767787658766767766766776776766666766557
635577208427364999050689790871754087769899098851869883484456
100
90 * *
80 * * * * * *** ** ** * * ** * *
70 * **** * * ** ********** *** ************** ***** ** *
60 ************ ******************************************** **
50 ************************************************************
40 ***************##***##***##******************#**************
30 ##***##***##**#################****##***###**##***#****##***
20 ############################################################
10 ############################################################
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5

CPU% per minute (last 60 minutes)
* = maximum CPU% # = average CPU%

111 1 1 11 11 11 11 1111 1111
999000909988899899999979988899999889099900990099989800999900980000990000
739000908934158256050859055610909290093400830096190200992500900000870000
100 * ******** ** ** * * * * * ** *** **** **** **** **********
90 ********** ** ****** ********** ***************** ********* **********
80 ************************************************************************
70 ********************************************#**************#****#****##*
60 ********************************************#**************##***##**###*
50 ********************************************#*********##***##**###**###*
40 *************************************#**#***#**#*****###**###**####*####
30 ##*******#******######*########*****###############*#########*##########
20 ########################################################################
10 ########################################################################
0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.
0 5 0 5 0 5 0 5 0 5 0 5 0

CPU% per hour (last 72 hours)
* = maximum CPU% # = average CPU%

The zabbix tracking ip is on port 1/49 under a vlan:interface Vlan1100
 description PE0-CISCOxLAN-CORE-CISCO
 no shutdown
 management
 no ip redirects
 ip address 168.205.37.254/30
 ip address 168.205.37.14/30 secondary
 ipv6 address 2804:2728:0:1::2/64
 ip router ospf 1 area 0.0.0.0
 ipv6 router ospfv3 1 area 0.0.0.0

I will not delete the VLan 1100 counter because this layer switch is not working the counter until zabbix does not show traffic. The 3064 nexus in layer 2 normally runs the vlans traffic.